Skip to content

feat(supervisor-middleware): add network egress middleware#2027

Open
pimlock wants to merge 27 commits into
mainfrom
1733-supervisor-middleware/pmlocek
Open

feat(supervisor-middleware): add network egress middleware#2027
pimlock wants to merge 27 commits into
mainfrom
1733-supervisor-middleware/pmlocek

Conversation

@pimlock

@pimlock pimlock commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implements the first usable RFC 0009 supervisor middleware slice: proto-backed, host-selected HTTP egress middleware for HttpRequest/pre_credentials, with both in-process built-ins and statically registered operator-run gRPC services.

The implementation covers RFC 0009 Phase 1 and adds basic external-service support from Phase 2. It establishes the contract, policy plumbing, ordered chain execution, built-in secret redaction, static gateway registration, relay integration, validation before policy persistence, body limits, audit events, and user-facing configuration and operations documentation.

Related Issue

Closes #2010
Part of #1733
Design/RFC: #1738

Changes

  • Add the supervisor middleware contract and runtime, including the built-in openshell/secrets redactor and statically registered operator-run gRPC services.
  • Add ordered, host-selector-based network_middlewares policy configuration and validation, independent of the network policy rule that admits a request.
  • Enforce middleware on inspected HTTP requests before credential injection, with transformations, safe header additions, body limits, and explicit fail-open/fail-closed behavior.
  • Deliver effective service configuration to supervisors with resilient registry reload and last-known-good behavior.
  • Add OCSF observability plus gateway, policy, operations, architecture, and extensibility documentation.

Testing

  • mise run pre-commit passes
  • Unit and integration-style tests added or updated
  • E2E coverage considered
    • No separate gateway E2E was added. The request relay, policy validation, registry, and remote gRPC paths are covered by automated unit and integration-style tests.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated
  • Published docs updated

@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@pimlock pimlock force-pushed the 1733-supervisor-middleware/pmlocek branch 2 times, most recently from 595191e to 97b750f Compare June 26, 2026 23:58
@pimlock pimlock self-assigned this Jun 30, 2026
@pimlock pimlock force-pushed the 1733-supervisor-middleware/pmlocek branch from 358906a to 1fbcdbc Compare June 30, 2026 19:48
@github-actions

Copy link
Copy Markdown

@pimlock

This comment was marked as outdated.

@pimlock

This comment was marked as outdated.

2 similar comments
@pimlock

This comment was marked as outdated.

@pimlock

This comment was marked as outdated.

@pimlock

pimlock commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

/ok to test c4b0dcf

pimlock added 18 commits July 2, 2026 15:17
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
…are outages

An unreachable operator-registered middleware service previously aborted
sandbox startup via a hard error in load_policy, contradicting the
per-request on_error contract and the resilient live-reload path.

Retry the initial connect and, on failure, degrade to the built-in
registry so matched requests are governed by each config's on_error
(deny for fail_closed, allow for fail_open) instead of blocking the whole
sandbox. The policy poll loop now reconciles the registry on every poll
while an install is pending, so a recovered service is adopted without
waiting for a config change; a failed reconcile also no longer blocks
unrelated policy updates.

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
…limit

A chain entry whose binding did not resolve reported a zero body limit,
which dragged the whole chain's buffer cap to zero and spuriously failed
body-bearing requests over capacity even when a resolved middleware could
have processed them. Exclude unresolved entries from the limit via a new
DescribedChainEntry::is_resolved(); when no entry resolves, skip buffering
and apply each entry's on_error directly.

Also fix two parallel-test flakes found while validating the change:

- Build middleware OCSF events into a Vec and assert on it directly
  instead of capturing through the global tracing pipeline, whose
  callsite-interest cache is process-global and raced under parallel runs.
- Accumulate the websocket deny response until the reason marker arrives
  rather than assuming a single read returns the full body.

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
@pimlock pimlock force-pushed the 1733-supervisor-middleware/pmlocek branch from c4b0dcf to 2b7cf4e Compare July 2, 2026 22:18
@pimlock pimlock added the test:e2e Requires end-to-end coverage label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Label test:e2e applied, but pull-request/2027 is at c4b0dcf while the PR head is 2b7cf4e. A maintainer needs to comment /ok to test 2b7cf4e1f9d4d70efa614998d8034637b9c8c90a to refresh the mirror. Once the mirror catches up, re-run Branch E2E Checks from the Actions tab.

@pimlock

This comment was marked as outdated.

@NVIDIA NVIDIA deleted a comment from copy-pr-bot Bot Jul 2, 2026
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
@pimlock

pimlock commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

/ok to test 0ef948b

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
@pimlock

pimlock commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

/ok to test 400ca0f

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
@pimlock pimlock changed the title feat(supervisor-middleware): add in-process egress middleware feat(supervisor-middleware): add network egress middleware Jul 3, 2026
@pimlock

pimlock commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

/ok to test 1f6aec1

@pimlock pimlock marked this pull request as ready for review July 3, 2026 18:34
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
@pimlock pimlock removed the test:e2e Requires end-to-end coverage label Jul 3, 2026
pimlock added 4 commits July 3, 2026 12:08
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
@pimlock pimlock mentioned this pull request Jul 3, 2026
4 tasks
@pimlock pimlock added the test:e2e Requires end-to-end coverage label Jul 3, 2026
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Label test:e2e applied for 3b58759. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

Signed-off-by: Piotr Mlocek <pmlocek@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement RFC 0009 phase 1 supervisor middleware

1 participant