-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Title: [Feature Request] Enclave key generation + OIDC discovery for NRAS — enabling zero-trust GPU inference without host RAM exposure
Body:
Summary
I’m fairly new to the GPU world and have been spending time researching confidential inference — specifically protecting proprietary model weights on multi-tenant H100/B200 infrastructure. I’ve been reading through the NVIDIA CC docs, nvtrust repo, and this forum, and I’m really impressed with what the CC stack provides (VRAM encryption + NRAS attestation).
That said, as I’ve been trying to piece together a practical deployment, I’ve run into a few areas where I’m either missing something or there might be genuine gaps. I’d really appreciate any guidance from folks who’ve been working with this longer than I have.
Context
The current CC implementation provides VRAM encryption and NRAS attestation — both work well for proving GPU identity and protecting weights at rest in GPU memory. The gap is what happens between the host CPU and the GPU: during envelope decryption, the plaintext DEK and decrypted model weights must briefly exist in host RAM before DMA transfer to VRAM. On infrastructure without a CPU TEE (SEV-SNP/TDX), this is observable by a privileged host operator.
Feature Requests
1. In-GPU Ephemeral Key Generation and Unwrap
Problem: The plaintext DEK and decrypted weights briefly exist in host RAM during VRAM loading. Hardened native extensions (mlock, MADV_DONTDUMP, PR_SET_DUMPABLE(0), explicit_bzero) raise the attack bar but cannot provide cryptographic guarantees without a CPU TEE.
Proposed APIs:
GenerateEphemeralKeyPair()
→ Creates asymmetric keypair inside GPU secure enclave
→ Embeds public key in attestation report (bound to hardware state)
→ Private key never leaves the GPU
UnwrapKey(wrapped_dek)
→ Unwraps a DEK inside the enclave using the ephemeral private key
→ Plaintext DEK available only within VRAM
Alternatively, expose AES-GCM unwrap as a secure enclave primitive invocable from CUDA kernels.
Why this matters: This would allow the external KMS/broker to wrap the DEK to the GPU's ephemeral public key. The GPU unwraps and decrypts entirely in VRAM — host CPU never sees plaintext. This eliminates the host RAM exposure window without requiring the operator to provision CPU TEE infrastructure (which breaks standard observability/orchestration tooling).
Analogy: Intel SGX sealing/unsealing APIs; AWS Nitro Enclaves exposing kms:Decrypt inside the enclave boundary.
2. OIDC Discovery Document for NRAS
Problem: NRAS publishes a JWKS endpoint but lacks /.well-known/openid-configuration. This blocks native AWS AssumeRoleWithWebIdentity federation — the cleanest credential-free architecture where the NRAS JWT itself becomes the cloud credential.
Proposed fix: Publish one static JSON file at https://nras.attestation.nvidia.com/.well-known/openid-configuration containing the standard OIDC discovery fields (issuer, jwks_uri, id_token_signing_alg_values_supported, etc.).
Why this matters: Without OIDC discovery, every organization deploying confidential inference must either (a) run a custom attestation broker (Lambda/Cloud Run) to bridge NRAS JWTs to KMS, or (b) configure HashiCorp Vault JWT Auth with a direct JWKS URI. Adding one file would unlock direct OIDC federation for the entire GPU CC ecosystem — eliminating the broker entirely for AWS/GCP/Azure users.
3. Offline / Local NRAS Verification Support
Problem: Every pod startup currently requires a synchronous NRAS API call. At scale, this creates latency bottlenecks and a hard runtime dependency on NRAS availability.
Current state: The local GPU verifier in nvtrust exists but requires manually managing CRLs, RIMs, and certificate chains. There's no documented caching strategy or recommended refresh interval.
Ask: Document a supported offline verification workflow: recommended CRL/RIM cache refresh intervals, cache invalidation strategy, and trust model guidance (i.e., who should control the cache to prevent stale-CRL attacks). Even a reference implementation of a caching proxy would help.
4. Cloud KMS Native Attestation Conditions
Problem: Cloud KMS providers (AWS, GCP, Azure) don't natively validate NRAS JWT claims as key policy conditions. AWS KMS already supports Nitro Enclave attestation conditions (kms:RecipientAttestation:PCR0) — a similar integration for GPU attestation would eliminate all intermediary code.
This likely requires NVIDIA + cloud provider partnership, but flagging here as the long-term endgame that would make GPU confidential computing as simple as Nitro Enclaves.
Environment
- GPUs: H100 (testing), B200 (target)
- CC Mode: GPU-only CC (no CPU TEE on current infrastructure)
- Attestation: NRAS with JWT verification
- KMS: AWS KMS with envelope encryption
Related
- NRAS documentation: https://docs.nvidia.com/attestation/
- nvtrust local verifier:
guest_tools/gpu_verifiers/local_gpu_verifier/ - AWS Nitro Enclaves KMS integration (analogous pattern): https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave-refapp.html