rfc-0009: supervisor middleware by pimlock · Pull Request #1738 · NVIDIA/OpenShell

pimlock · 2026-06-04T04:35:48Z

Note

The RFC is now open for feedback. The contract is framed as a research preview (provisional, may change incompatibly), and a handful of design Open questions remain - they are listed in the RFC.

Summary

RFC 0009 - Supervisor Middleware. Proposes hooks in the supervisor that can inspect, transform, allow/deny, and annotate supervisor-managed operations. The v1 hook is HTTP request middleware in the supervisor proxy, before OpenShell injects upstream credentials. Privacy Guard remains the motivating use case, but the naming is intentionally broader than egress so the design can grow to other supervisor hooks later.

The main README.md holds the proposal end to end; two appendices carry supporting detail (deployment options, protocol extensions).

Tip

See here for more visual representation of this RFC. Inspired by @mrunalp (source)

Approach summary

This RFC has been developed together with AI helpers. The final RFC is distilled from lots of source material that is not included to keep it somewhat short.

Approach details

Here's how my rough approach looked like:

start with the use case, initial conversations around what mechanism could support this use case, while being open to other use cases.
very initial high-level -> create a doc
over time, accumulate more docs focused on different aspects. Some of these docs were more structured, some of them were a random thought to be explored.
converse with these docs and add more context
once most of the aspects of the design were somewhat explored -> start boiling it down: "happy path design" that is easy to follow (i.e. single path, only mentioning where forks were, but not exploring all the forks).
- I kept a TRACKER.md doc, something high level tracking the status of the distillation which allowed going through the doc step-by-step, while keeping a memory of what's in-flight, place to drop random thoughts in. (It has since been removed from the RFC now that the draft is complete.)
lots of editing, I still find LLMs to be quite verbose, repetitive, writing something in 3 sentences that could be 1 or using phrases that are vague.

Why I'm writing this down -> future reference and also to get feedback on this part. I'm curious how others are approaching writing docs like this, so please share if you have thoughts!

Related Issue

RFC tracking: Sandbox egress middleware #1733
Roadmap: Privacy Guard #1043
Model routing (out of scope here): Pluggable model routing RFC #1734
Related RFC: rfc-0010: gateway interceptors #1927 (RFC 0010 - gateway interceptors)

Changes

Add rfc/0009-supervisor-middleware/README.md - the full draft: Summary, Motivation, Privacy Guard use case, Non-goals, Terminology, Proposal (architecture, hook placement, contract + proto sketch, contract versioning, registration/delivery + reload semantics, policy integration, ordering, metadata, OCSF audit/logging), relationship to RFC 0010, Prior art, Implementation plan, Risks, Alternatives, and Open questions.
Add rfc/0009-supervisor-middleware/appendices/deployment-options.md - deployment-mode decision and future options.
Add rfc/0009-supervisor-middleware/appendices/protocol-extensions.md - future streaming operations, additional hooks, semantic context, content preview, portable capabilities, header rules, and middleware authentication.

Notes

Supervisor middleware rename

The feature name moved from "sandbox egress middleware" to "supervisor middleware" so the API can remain extensible for future supervisor hook types beyond egress HTTP requests. The v1 hook is still HttpRequest/pre_credentials.

Appendices

I kept only two appendices (deployment-options, protocol-extensions). Four other detailed appendices were planned (request/response contract, policy integration, pipeline placement, failure-and-audit) but dropped: their load-bearing content was inlined into the README so the RFC stays self-contained, and the rest was implementation-spec detail that doesn't belong in a design doc.

The drafting tracker (TRACKER.md) has been removed from the RFC now that the draft is complete.

Testing

N/A - documentation only.

Checklist

Docs-only change
All README sections drafted (Terminology, Implementation plan, Risks, Alternatives, Open questions now complete)
Open questions resolved
RFC number assigned

Changelog

2026-06-04: initial version
2026-06-08: address review feedback - run the hook on any parsed HTTP request regardless of protocol (only tls: skip/opaque TCP bypasses it); clarify the Route selection step; add the v1 in/out-of-scope list (WebSocket upgrade in, frames out); make over-cap = on_error (fail-closed) explicit; document middleware chain composability; add the built-in-only http.request.post_credentials hook; note registration ergonomics as deferred.
2026-06-17: rename the RFC folder to rfc/0009-sandbox-egress-middleware; address review feedback around reserved middleware namespaces, hook naming, route-selection scope, trusted middleware endpoints, optional actor data, Finding shape, append-only headers, endpoints selectors, provider-profile scope, multitenancy, health checks, future deployment timelines, and metadata-only response-completion notifications.
2026-06-23: rename the feature and RFC folder to supervisor middleware; align shared plumbing with RFC 0010 gateway interceptors; switch GetCapabilities to Describe; model service-owned middleware bindings; keep v1 evaluation unary with EvaluateHttpRequest; narrow on_error to fail_open/fail_closed; update the sample evaluation/result proto shape around binding_id, HttpRequestTarget, originating_process, flattened result fields, and has_body.

…ensions appendix

…view framing

… mode

Add a Contract versioning subsection (openshell.middleware.v1, ProxyMiddleware service, GetCapabilities version handshake) and registration reload/removal semantics. Rename the gRPC service to ProxyMiddleware, defer max_body_bytes and timeout_ms to implementation, and record the v1alpha1-vs-v1 open question.

johntmyers · 2026-06-15T21:12:58Z

+  string deny_reason = 2;               // safe, machine-readable
+  map<string, string> set_headers = 3;  // subject to an OpenShell allow-list
+  map<string, string> metadata = 4;     // namespaced, no raw values
+  repeated Finding finding = 5;         // labels, counts, confidence


I don't see a definition for this so just curious what that might look like.

Good catch. The idea here was to have something that could generate finding OCSF events.

We already emit it in a few places.

For the middleware contract, it could be something like this:

{ "type": "pii.email", "label": "Email address", "count": 3, "confidence": "...", "severity": "..." }

And based on that, we'd emit event from the supervisor.

johntmyers · 2026-06-15T21:14:57Z

+
+The interface is gRPC. The hot-path RPC is declared as a bidirectional stream, but v1 exchanges exactly one `ProcessRequest` and one `ProcessResponse` over it: the supervisor buffers the bounded body and the middleware replies once. Declaring it as a stream now is deliberate, because gRPC method cardinality cannot change compatibly. It lets a later version chunk large payloads without altering the method signature. Possible extensions (chunked streaming, additional hooks, semantic context) are collected in the [protocol-extensions appendix](appendices/protocol-extensions.md), including what streaming does and does not buy. The baseline middleware ships in the supervisor and is served in-process over the same gRPC contract, with no network hop. The exact field set is settled during implementation; the sketch above is the contract shape this RFC asks reviewers to evaluate.
+
+The `actor` process is the same identity OpenShell already resolves on the egress path - the binary, pid, and ancestor chain it uses for binary-scoped network policy and OCSF audit. It is resolved when the connection is established, so it is per-connection rather than strictly per-request: over a reused or pooled connection it identifies the process that opened the connection, which a middleware should not over-trust for per-request attribution.


Given the direction we're going on explicitly setting modes/modules for the supervisor, should we be able to communicate that "no actor data" is available if we're in a proxy-only situation?

Good point. I included the actor process data for complete context about the request (e.g. could be helpful for logging), but we could also drop it at this point, or ensure it's documented as optional.

Maybe drop for now?

johntmyers · 2026-06-15T21:42:08Z

+
+Implementation-wise, the hook is a new supervisor-side Rust enforcement stage selected by policy data, not a Rego rule. Existing Rego evaluation remains the metadata gate: L4 policy admits the connection and, where the endpoint declares a `protocol`, L7 policy admits the parsed request; the supervisor then buffers the bounded body, calls the configured middleware chain, and applies the returned decision or transformation. The hook does not depend on L7 Rego running - on a parsed request to an endpoint with no `protocol`, the chain still runs after L4 admits the connection. Request bodies do not become Rego input in v1. A later design may add a declarative pass over middleware findings, but v1 applies middleware outcomes directly.
+
+```yaml


This config makes sense in the "single policy document" world. How do we reference network policies that are defined in provider profiles? I think it would make sense as best practice to "define your middlewares in the sandbox policy" but it's likely the bulk of your network policies are coming from providers. Similarly to how we support global assignments via the requests object could we also assigned middleware to a provider type?

This is a great point.

Yes, we could reference providers profiles in the middleware include mechanism (requests/endpoints). We would operate on the provider profile, not provider instance here, right? If the profile gets removed, we just configure? But we validate when the policy is created to ensure the profile exists.

Something like:

network_middlewares: - name: openshell-secrets middleware: openshell-secrets config: secrets: redact endpoints: ["foo.com"] provider_profiles: ["github"]

I would have to look closer where would it would need to be dereferenced.

Alt 1.

We could hoist the middleware config into a separate entity, similar to provider profiles. That would define endpoints/provider profiles and the config. This could actually be a future extension, the benefit here is that you configure the middleware once and you don't need to repeat the config in each sandbox's policy.

Alt 2.

We could leave it as-is for now and in the initial version you couldn't configure the middleware for a provider. We could wait and see how this is used, it's possible that we won't need provider profile to control granularity for attaching middleware? Maybe that inclusion would be configured on based on different dimensions, e.g. "inference".

I think for now I'd lean toward alt 2. CC @kirit93 I'm curious what your take is.

johntmyers · 2026-06-15T21:44:36Z

+
+When more than one middleware applies to a request, the order is well-defined:
+
+- Middleware configs are defined once in a top-level `network_middlewares` list and attached to traffic from network policies or individual endpoints through an explicit `middleware: [...]` list. Policy-level lists apply to all endpoints in the policy; endpoint-level lists apply only to that endpoint and append after the policy-level list. These lists determine the relative order of the configs they name.


Should we say middleware configs must be defined in the policy and only the policy? Meaning they are not supported in provider profiles?

One use case where I can see adding middlewares in the provider profile (pp) is for things like sigv4, if I'm creating an AWS provider, I'd like to configure sigv4 middleware. That middleware would be built-in, so it's always available and is not relying on gateway configuration having that middleware added. We could limit it to built-in ones and configuration in the endpoint and not global network_middlewares: ... section.

Another place where middleware in pp is useful is reusing the same config in multiple sandboxes. Right now it's in the policy, so I need to include it in each sandbox and if I want to update it, I need to do it manually in all sandboxes. But that doesn't feel like something providers should deal with, if we were to go this route to allow reuse/reference, it may be better to go with alt 1 from the other comment.

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

pimlock · 2026-06-23T22:03:19Z

There is a new RFC 0010 (gateway interceptors) that is similar in some ways: #1927.

We synced up with @drew and came up with a list of items to align on between this RFC 0010: #1927 (comment)

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

pimlock · 2026-07-03T20:28:49Z

Implementation feedback: revise the middleware attachment model

Implementation in #2027 exposed one important ambiguity in the RFC's policy model. We resolved it by implementing selector-only middleware attachment for v1.

Implemented rule

Middleware is defined and attached only through the top-level network_middlewares list.
Each middleware config has an endpoints.include / endpoints.exclude host selector.
Network and L7 policy first decide whether the request is admitted.
After admission, OpenShell evaluates the ordered network_middlewares list against the admitted destination host. Matching entries form the chain in declaration order.
NetworkPolicyRule and NetworkEndpoint do not have middleware: [...] attachment fields.
A policy-local middleware config name is unique and runs at most once per request. Different config names remain distinct even if they reference the same implementation.

Why explicit policy/endpoint attachment was removed

OpenShell network policies are an additive bag of grants, not a hierarchy with one canonical winning rule. A request is allowed when at least one policy matches, and several policies may match the same request. Those policies may come from different sources: the user-authored base policy, provider-derived policy, or merged policy chunks/proposals.

That makes attachment semantics ambiguous. For example:

Policy A allows a request and attaches middleware, while policy B also allows it without middleware. Should the middleware run?
Policy A attaches chain [x] and policy B attaches [y]. Should OpenShell choose one, union them, reject the overlap, or invent cross-policy ordering?
A broad provider-derived rule and a specific user rule may both admit the request. Neither is inherently the parent or winner.

Choosing the matched policy as the attachment source could also make enforcement depend on policy provenance or allow an overlapping grant without middleware to bypass a middleware-bearing grant.

Host selectors on the middleware configs avoid this ambiguity. Selection is independent of whichever grant admitted the request and remains stable when user, provider, and merged policy layers overlap. We can reassess richer selector dimensions or explicit attachments later, when a concrete use case gives us enough information to define composition and ordering semantics.

RFC edits implied by this decision

The RFC should update these sections:

Policy integration
- Remove policy-level and endpoint-level middleware: [...] attachment fields.
- Describe network_middlewares[].endpoints as the complete v1 attachment surface, not an optional global surface alongside explicit lists.
- State that selection occurs after admission and is independent of the network rule or rules that matched.
YAML example
- Remove middleware lists from network_policies and their endpoints.
- Give every active middleware config an endpoints.include selector and optional exclusions.
- Show that separate configs can target Bedrock and S3 hosts while using the same implementation.
Middleware ordering
- Replace the mixed global/explicit ordering rules with one rule: matching configs run in top-level network_middlewares declaration order.
- Remove first-explicit-occurrence and deduplication semantics for attachment lists.
Provider-derived policy text
- Replace attachment-after-effective-policy-assembly language. Provider-derived endpoints are covered by the same middleware host selectors after effective policy composition; middleware does not need to mutate or attach to provider policy rules.
Implementation plan and decisions summary
- Remove policy/endpoint attachment fields from Phase 1.
- Replace “Endpoint selectors plus explicit attachment” with “Selector-only attachment in v1.”
- Record policy/endpoint attachment and its union/override/ordering behavior as deferred design work.

Suggested replacement language:

Middleware selection is independent of the network policy rule that admits a request. After network and L7 policy admission, OpenShell selects middleware by matching the admitted destination host against the ordered top-level network_middlewares selectors. Matching configs run in declaration order. V1 does not attach middleware lists to network policies or individual endpoints.

Other implementation deltas worth reconciling

External services shipped with the initial implementation. The implementation includes both the in-process openshell/secrets binding and statically registered operator-run gRPC services, rather than stopping after a built-in-only phase.
Registration and plaintext shape. Operator services are declared in [[openshell.gateway.middleware]]. A configured grpc_endpoint accepts http:// directly for the current local/research path; there is no separate allow_insecure field. https:// uses platform trust roots. Custom roots, caller authentication, and credential rotation remain deferred.
Body-limit ownership. Each binding advertises max_body_bytes; external registrations provide an operator limit that cannot exceed the advertised capability. The chain buffers to the smallest limit among resolved entries. The cap counts the consumed wire representation for chunked middleware requests, and over-cap request/replacement bodies follow on_error.
Unavailable bindings and recovery. Gateway startup still rejects an operator registration that cannot connect and return a valid Describe manifest. At the sandbox boundary, an unavailable delivered binding follows its config's on_error: sandbox startup preserves the built-in registry if the service cannot be connected, then the policy poll loop retries reconciliation independently of config revision. Reload failure keeps the last-known-good registry; recovery installs the new registry without requiring a policy change.
Validation timing. Duplicate names, implementation names, on_error, required selectors, selector syntax, built-in config, and tls: skip conflicts are validated before policy acceptance and again against the composed effective policy in the supervisor.
Observability. The implementation emits per-invocation HTTP activity plus detection findings for fail-open failures, fail-closed failures, findings returned by middleware, body-unavailable outcomes, and registry state changes.

The selector-only attachment decision is the normative design change here. The remaining bullets mostly align the RFC's implementation plan and operational details with what #2027 now does.

This comment was marked as outdated.

Sign in to view

elezar reviewed Jun 4, 2026

View reviewed changes

Comment thread rfc/0005-sandbox-egress-middleware/TRACKER.md Outdated

pimlock added 14 commits June 4, 2026 15:09

docs(rfc): add RFC 0005 sandbox egress middleware draft

830be00

docs(rfc): add prior art section and adjust architecture diagram

43a696c

docs(rfc): clarify hot-path opt-in, add proto sketch and protocol-ext…

82992d6

…ensions appendix

docs(rfc): fix duplicate ProcessResponse and update tracker status

48fedad

docs(rfc): track limits, failure modes, limitations, and research-pre…

33f442f

…view framing

docs(rfc): add research-preview middleware auth and explicit insecure…

522d802

… mode

docs(rfc): add terminology section for egress middleware

50d4a2b

docs(rfc): add two-phase implementation plan for egress middleware

3250b54

docs(rfc): add risks section for egress middleware

3950b0f

docs(rfc): add alternatives section for egress middleware

8950bb0

docs(rfc): rename Verdict to Outcome in middleware contract

e1b9e84

open questions

d00ed91

docs(rfc): drop detail appendices and inline load-bearing notes

9e87302

pimlock force-pushed the rfc-0005-sandbox-egress-middleware branch from 4810be8 to 9730e6f Compare June 4, 2026 22:37

pimlock added 2 commits June 4, 2026 15:47

docs(rfc): soften middleware removal semantics

4f0c3c3

docs(rfc): remove drafting tracker from RFC

cd91af9

pimlock marked this pull request as ready for review June 4, 2026 23:44

pimlock requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners June 4, 2026 23:44

pimlock requested a review from elezar June 4, 2026 23:44

pimlock added topic:l7 Application-layer policy and inspection work area:supervisor Proxy and routing-path work rfc labels Jun 4, 2026

pimlock self-assigned this Jun 4, 2026

docs(rfc): clarify egress middleware contract

4dcc7ed

johntmyers reviewed Jun 15, 2026

View reviewed changes

Comment thread rfc/0005-sandbox-egress-middleware/README.md Outdated

johntmyers reviewed Jun 15, 2026

View reviewed changes

Comment thread rfc/0009-sandbox-egress-middleware/README.md Outdated

pimlock requested a review from johntmyers June 15, 2026 23:37

afourniernv mentioned this pull request Jun 16, 2026

docs(rfc): add loose inspection contract proposal #1930

Closed

6 tasks

drew changed the title ~~docs(rfc): RFC 0005 - Sandbox Egress Middleware~~ rfc-0009 - sandbox egress middleware Jun 16, 2026

This comment was marked as resolved.

Sign in to view

drew added this to OpenShell Roadmap Jun 16, 2026

github-project-automation Bot moved this to Todo in OpenShell Roadmap Jun 16, 2026

drew moved this from Todo to In progress in OpenShell Roadmap Jun 16, 2026

pimlock changed the title ~~rfc-0009 - sandbox egress middleware~~ rfc-0009: sandbox egress middleware Jun 17, 2026

pimlock added 4 commits June 17, 2026 16:51

chore: merge main into RFC branch

0abaf56

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

docs(rfc): rename sandbox egress middleware RFC to 0009

f6b9450

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

docs(rfc): address sandbox egress middleware feedback

9502058

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

clarifications on reserved names

53646bb

afourniernv mentioned this pull request Jun 22, 2026

feat(supervisor-network): add request inspection seam poc #1971

Draft

6 tasks

johntmyers added this to the OpenShell Beta milestone Jun 23, 2026

pimlock added 6 commits June 23, 2026 15:05

clean ups

51fb63d

rename -> supervisor-middleware

3cbeaab

initial update after aligning with interceptors

97cdf50

docs(rfc): make middleware evaluation unary

d004cc0

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

docs(rfc): describe middleware bindings

1b1098c

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

docs(rfc): refine middleware evaluation shape

851ccd0

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>

pimlock changed the title ~~rfc-0009: sandbox egress middleware~~ rfc-0009: supervisor middleware Jun 24, 2026

pimlock mentioned this pull request Jun 26, 2026

feat(supervisor-middleware): add network egress middleware #2027

Open

7 tasks


		The interface is gRPC. The hot-path RPC is declared as a bidirectional stream, but v1 exchanges exactly one `ProcessRequest` and one `ProcessResponse` over it: the supervisor buffers the bounded body and the middleware replies once. Declaring it as a stream now is deliberate, because gRPC method cardinality cannot change compatibly. It lets a later version chunk large payloads without altering the method signature. Possible extensions (chunked streaming, additional hooks, semantic context) are collected in the [protocol-extensions appendix](appendices/protocol-extensions.md), including what streaming does and does not buy. The baseline middleware ships in the supervisor and is served in-process over the same gRPC contract, with no network hop. The exact field set is settled during implementation; the sketch above is the contract shape this RFC asks reviewers to evaluate.

		The `actor` process is the same identity OpenShell already resolves on the egress path - the binary, pid, and ancestor chain it uses for binary-scoped network policy and OCSF audit. It is resolved when the connection is established, so it is per-connection rather than strictly per-request: over a reused or pooled connection it identifies the process that opened the connection, which a middleware should not over-trust for per-request attribution.


		Implementation-wise, the hook is a new supervisor-side Rust enforcement stage selected by policy data, not a Rego rule. Existing Rego evaluation remains the metadata gate: L4 policy admits the connection and, where the endpoint declares a `protocol`, L7 policy admits the parsed request; the supervisor then buffers the bounded body, calls the configured middleware chain, and applies the returned decision or transformation. The hook does not depend on L7 Rego running - on a parsed request to an endpoint with no `protocol`, the chain still runs after L4 admits the connection. Request bodies do not become Rego input in v1. A later design may add a declarative pass over middleware findings, but v1 applies middleware outcomes directly.

		```yaml


		When more than one middleware applies to a request, the order is well-defined:

		- Middleware configs are defined once in a top-level `network_middlewares` list and attached to traffic from network policies or individual endpoints through an explicit `middleware: [...]` list. Policy-level lists apply to all endpoints in the policy; endpoint-level lists apply only to that endpoint and append after the policy-level list. These lists determine the relative order of the configs they name.

Uh oh!

Conversation

pimlock commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach summary

Related Issue

Changes

Notes

Supervisor middleware rename

Appendices

Testing

Checklist

Changelog

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Alt 1.

Alt 2.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

pimlock commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pimlock commented Jul 3, 2026

Implementation feedback: revise the middleware attachment model

Implemented rule

Why explicit policy/endpoint attachment was removed

RFC edits implied by this decision

Other implementation deltas worth reconciling

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

pimlock commented Jun 4, 2026 •

edited

Loading

pimlock commented Jun 23, 2026 •

edited

Loading