Skip to content

fix(server): apply sandbox labels to the compute resource#2105

Open
Gal-Zaidman wants to merge 1 commit into
NVIDIA:mainfrom
Gal-Zaidman:feat/propagate-sandbox-labels-to-pod
Open

fix(server): apply sandbox labels to the compute resource#2105
Gal-Zaidman wants to merge 1 commit into
NVIDIA:mainfrom
Gal-Zaidman:feat/propagate-sandbox-labels-to-pod

Conversation

@Gal-Zaidman

Copy link
Copy Markdown
Contributor

Summary

Sandbox labels supplied via openshell sandbox create --label k=v were stored only as sandbox metadata and never reached the underlying compute resource. This propagates them onto the sandbox template so drivers apply them to the pod (Kubernetes) or container (Docker), while guarding the driver-managed openshell.ai/ namespace.

Related Issue

Closes #2102

Changes

  • Propagate CreateSandboxRequest.labels onto SandboxTemplate.labels in the gateway create path, so they flow through DriverSandboxTemplate.labels to the driver's native tagging. An explicit template label wins on key conflicts.
  • Reject the reserved openshell.ai/ label-key namespace (driver-managed: managed-by, sandbox id) so callers cannot collide with or spoof managed labels. Folded into validate_label_key, so it applies wherever labels are validated.
  • Validate the request-label count (validate_labels) in the gateway so merging request labels onto the template stays within the template map limit — keeping all label validation in the validation module.
  • Document that labels apply to the compute resource and that openshell.ai/ is reserved (docs/sandboxes/manage-sandboxes.mdx).

Testing

  • Unit tests added/updated (label propagation, template-precedence, reserved-namespace rejection, label-count bound)
  • cargo fmt --check and cargo clippy pass on the touched crates
  • Verified live on a Kubernetes gateway: built/pushed the image, helm upgraded, created a sandbox with --label verifyfix=v4, and confirmed kubectl get pods -l verifyfix=v4 matched the pod; confirmed a reserved openshell.ai/* key is rejected
  • E2E tests added/updated

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

Request labels were stored only as sandbox metadata and never reached the
pod or container. Propagate them onto the sandbox template so drivers tag
the compute resource; explicit template labels win on key conflicts.

Reject the driver-managed `openshell.ai/` label namespace and validate the
label count in the gateway so the merged template map stays bounded.

Signed-off-by: Gal Zaidman <gzaidman@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: sandbox --label values are stored as metadata but never applied to the pod/container

1 participant