Enabling zero-trust GPU inference without host RAM exposure

## Title: [Feature Request] Enclave key generation + OIDC discovery for NRAS — enabling zero-trust GPU inference without host RAM exposure

## Body:

### Summary

I’m fairly new to the GPU world and have been spending time researching confidential inference — specifically protecting proprietary model weights on multi-tenant H100/B200 infrastructure. I’ve been reading through the NVIDIA CC docs, nvtrust repo, and this forum, and I’m really impressed with what the CC stack provides (VRAM encryption + NRAS attestation).

That said, as I’ve been trying to piece together a practical deployment, I’ve run into a few areas where I’m either missing something or there might be genuine gaps. I’d really appreciate any guidance from folks who’ve been working with this longer than I have.


### Context

The current CC implementation provides VRAM encryption and NRAS attestation — both work well for proving GPU identity and protecting weights at rest in GPU memory. The gap is what happens *between* the host CPU and the GPU: during envelope decryption, the plaintext DEK and decrypted model weights must briefly exist in host RAM before DMA transfer to VRAM. On infrastructure without a CPU TEE (SEV-SNP/TDX), this is observable by a privileged host operator.

### Feature Requests

#### 1. In-GPU Ephemeral Key Generation and Unwrap

**Problem:** The plaintext DEK and decrypted weights briefly exist in host RAM during VRAM loading. Hardened native extensions (`mlock`, `MADV_DONTDUMP`, `PR_SET_DUMPABLE(0)`, `explicit_bzero`) raise the attack bar but cannot provide cryptographic guarantees without a CPU TEE.

**Proposed APIs:**

```
GenerateEphemeralKeyPair()
  → Creates asymmetric keypair inside GPU secure enclave
  → Embeds public key in attestation report (bound to hardware state)
  → Private key never leaves the GPU

UnwrapKey(wrapped_dek)
  → Unwraps a DEK inside the enclave using the ephemeral private key
  → Plaintext DEK available only within VRAM
```

Alternatively, expose AES-GCM unwrap as a secure enclave primitive invocable from CUDA kernels.

**Why this matters:** This would allow the external KMS/broker to wrap the DEK to the GPU's ephemeral public key. The GPU unwraps and decrypts entirely in VRAM — host CPU never sees plaintext. This eliminates the host RAM exposure window without requiring the operator to provision CPU TEE infrastructure (which breaks standard observability/orchestration tooling).

**Analogy:** Intel SGX sealing/unsealing APIs; AWS Nitro Enclaves exposing `kms:Decrypt` inside the enclave boundary.

#### 2. OIDC Discovery Document for NRAS

**Problem:** NRAS publishes a JWKS endpoint but lacks `/.well-known/openid-configuration`. This blocks native AWS `AssumeRoleWithWebIdentity` federation — the cleanest credential-free architecture where the NRAS JWT itself becomes the cloud credential.

**Proposed fix:** Publish one static JSON file at `https://nras.attestation.nvidia.com/.well-known/openid-configuration` containing the standard OIDC discovery fields (`issuer`, `jwks_uri`, `id_token_signing_alg_values_supported`, etc.).

**Why this matters:** Without OIDC discovery, every organization deploying confidential inference must either (a) run a custom attestation broker (Lambda/Cloud Run) to bridge NRAS JWTs to KMS, or (b) configure HashiCorp Vault JWT Auth with a direct JWKS URI. Adding one file would unlock direct OIDC federation for the entire GPU CC ecosystem — eliminating the broker entirely for AWS/GCP/Azure users.

#### 3. Offline / Local NRAS Verification Support

**Problem:** Every pod startup currently requires a synchronous NRAS API call. At scale, this creates latency bottlenecks and a hard runtime dependency on NRAS availability.

**Current state:** The local GPU verifier in `nvtrust` exists but requires manually managing CRLs, RIMs, and certificate chains. There's no documented caching strategy or recommended refresh interval.

**Ask:** Document a supported offline verification workflow: recommended CRL/RIM cache refresh intervals, cache invalidation strategy, and trust model guidance (i.e., who should control the cache to prevent stale-CRL attacks). Even a reference implementation of a caching proxy would help.

#### 4. Cloud KMS Native Attestation Conditions

**Problem:** Cloud KMS providers (AWS, GCP, Azure) don't natively validate NRAS JWT claims as key policy conditions. AWS KMS already supports Nitro Enclave attestation conditions (`kms:RecipientAttestation:PCR0`) — a similar integration for GPU attestation would eliminate all intermediary code.

**This likely requires NVIDIA + cloud provider partnership**, but flagging here as the long-term endgame that would make GPU confidential computing as simple as Nitro Enclaves.

### Environment

- GPUs: H100 (testing), B200 (target)
- CC Mode: GPU-only CC (no CPU TEE on current infrastructure)
- Attestation: NRAS with JWT verification
- KMS: AWS KMS with envelope encryption

### Related

- NRAS documentation: https://docs.nvidia.com/attestation/
- nvtrust local verifier: `guest_tools/gpu_verifiers/local_gpu_verifier/`
- AWS Nitro Enclaves KMS integration (analogous pattern): https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave-refapp.html


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling zero-trust GPU inference without host RAM exposure #135

Title: [Feature Request] Enclave key generation + OIDC discovery for NRAS — enabling zero-trust GPU inference without host RAM exposure

Body:

Summary

Context

Feature Requests

1. In-GPU Ephemeral Key Generation and Unwrap

2. OIDC Discovery Document for NRAS

3. Offline / Local NRAS Verification Support

4. Cloud KMS Native Attestation Conditions

Environment

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enabling zero-trust GPU inference without host RAM exposure #135

Description

Title: [Feature Request] Enclave key generation + OIDC discovery for NRAS — enabling zero-trust GPU inference without host RAM exposure

Body:

Summary

Context

Feature Requests

1. In-GPU Ephemeral Key Generation and Unwrap

2. OIDC Discovery Document for NRAS

3. Offline / Local NRAS Verification Support

4. Cloud KMS Native Attestation Conditions

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions