Skip to content

Intel-SGX TEE Accelerated CRG Integration with Klyshko#98

Open
rohithvaidya wants to merge 24 commits intocarbynestack:masterfrom
datakaveri:feat-sgx-integration
Open

Intel-SGX TEE Accelerated CRG Integration with Klyshko#98
rohithvaidya wants to merge 24 commits intocarbynestack:masterfrom
datakaveri:feat-sgx-integration

Conversation

@rohithvaidya
Copy link

  1. Intel SGX–backed secure CRG execution — Runs MP-SPDZ Fake-Offline inside Intel SGX enclaves, providing hardware-backed confidentiality and integrity for the MPC offline phase.

  2. End-to-end attestation workflow — Implements local attestation (same node) and mutual remote attestation across VCPs using RA-TLS with DCAP before any tuple generation.

  3. Secure key exchange via KII — Integrates Klyshko Integration Interface (KII) to securely exchange MAC key shares over attested TLS (protobuf), ensuring correlated randomness generation only after successful enclave verification.

  4. Mixed Mode Klyshko Operator for switching between default Offline Phase and TEE Enabled Offline Phase

Usage and Setup Docs

…notice.md

Signed-off-by: Sarthak Sharma <sarthak.sharma@datakaveri.org>
Signed-off-by: Sarthak Sharma <sarthaksharma070@gmail.com>
Signed-off-by: Sarthak Sharma <sarthak.sharma@datakaveri.org>
Signed-off-by: Sarthak Sharma <sarthak.sharma@datakaveri.org>
Signed-off-by: Sarthak Sharma <sarthak.sharma@datakaveri.org>
Signed-off-by: Sarthak Sharma <sarthak.sharma@datakaveri.org>
Replace raw memcpy with explicit destination size checks
to satisfy Codacy and prevent potential buffer overflows
in MAC key handling.

Signed-off-by: Sarthak Sharma <sarthak.sharma@datakaveri.org>
Signed-off-by: Sarthak Sharma <sarthak.sharma@datakaveri.org>
@rohithvaidya rohithvaidya requested a review from a team as a code owner February 24, 2026 10:23
@sbckr
Copy link
Member

sbckr commented Feb 25, 2026

Thank you @rohithvaidya and the entire team at the CDPG / datakaveri for filing this PR, and for the sustained effort that went into it. 💪 🎊

Bringing hardware-backed TEE support to Klyshko's offline phase is a significant milestone for the project. Confidential MPC has long been a topic of interest in the community, and having a concrete, working integration of Intel SGX with Gramine and RA-TLS into the Correlated Randomness Generation pipeline is genuinely exciting. The approach, local attestation within a VCP, mutual remote attestation across VCPs, and secure MAC key exchange via RA-TLS/KII before any tuple generation, is exactly the kind of defense-in-depth design the project needs. The inclusion of a dedicated TEE operator mode that preserves full backward compatibility with non-SGX deployments is also very thoughtful.

Before we can continue with the merge process, there are a few items that need to be addressed, two of which are flagged as failing checks on the PR.

❌ Blocking — Please Address Before We Can Proceed

1. DCO Sign-off Failing (6 Commits)

The DCO check is reporting 6 commits that are missing a Signed-off-by trailer. These are the non-merge commits introduced by @rohithvaidya:

Commit Subject
0d0dab9 chore: add protocolbuffers to sbom, update README
0e7992e fix: MRSIGNER verification env var optional, removing MAC refs from Dockerfile
dcb77e3 chore: add license for dependencies
1b5fd84 fix: change security flags and set defaults, update Dockerfile to create placeholder files for Gramine
e3203e7 test: unit tests for utility fns
03e591f docs: added deployment steps for SGX integration with Klyshko and security configuration

The DCO (Developer Certificate of Origin) requires a Signed-off-by line in each commit message, asserting that you have the right to submit the contribution under the project license. Please see CONTRIBUTING.md for details.

To fix, you can rebase interactively and add sign-offs:

git rebase -i HEAD~N   # where N covers the unsigned commits
# Mark each commit as 'reword', then save
# In each commit message, add:
# Signed-off-by: Rohith Vaidhyanathan <rohith.vaidhyanathan@datakaveri.org>

Or, for individual commits:

git commit --amend --no-edit --signoff   # for the most recent commit
git push --force-with-lease

2. Codacy Static Analysis — 100 New Issues

The Codacy Production check is reporting 100 new issues introduced by this PR. Before we can merge, these need to be reviewed and addressed (or explicitly acknowledged with a justification for those that are false positives or acceptable trade-offs given the SGX/Gramine context).

Please review the full findings in the Codacy PR report and work through them systematically.

If any findings are deemed false positives or inapplicable (e.g., patterns required by Gramine's C API or SGX calling conventions), please add inline annotations or document the rationale clearly so reviewers can assess.

3. CI Workflow — Build and Test for klyshko-mp-spdz-tee

The PR introduces a new module (klyshko-mp-spdz-tee) but does not include a corresponding GitHub Actions workflow. All other modules in this repository ship with their own build-and-test workflow:

  • Operator: (./.github/workflows/operator.build-and-test.yaml)
    Go unit tests + Codecov
  • MP-SPDZ CowGear: (./.github/workflows/mp-spdz-cowgear.build-and-test.yaml)
    Docker image build + bats roundtrip test

Please add a workflow file at .github/workflows/mp-spdz-tee.build-and-test.yaml. Given the SGX hardware requirements, here is a proposal that balances CI signal with runner constraints:

Recommended approach — two-stage workflow:

Stage 1 — Unit Tests (no SGX hardware required):
The klyshko-mp-spdz-tee/tests/ directory already contains a stub-based CMocka test suite (CRG_test, client_test, server_test) that compiles and runs without SGX hardware. This can run on a standard ubuntu-22.04 runner:

- name: Install test dependencies
  run: |
    sudo apt-get update
    sudo apt-get install -y libcmocka-dev libprotobuf-c-dev pkg-config gcc

- name: Run unit tests
  working-directory: ${{ env.WORKING_DIRECTORY }}/tests
  run: make test

Code coverage can be reported to Codecov following the operator pattern (gcov/lcov).

Stage 2 — Docker Image Build (build-time only, no SGX runtime required):
The Docker image build — including Gramine manifest generation and enclave signing (gramine-sgx-sign) — does not require SGX hardware; only runtime execution does. A temporary test key can be generated in CI if required:

- name: Generate CI enclave signing key
  working-directory: ${{ env.WORKING_DIRECTORY }}
  run: openssl genrsa -out enclave-key.pem 3072

- name: Build Docker image
  uses: docker/build-push-action@v3
  with:
    context: ${{ env.WORKING_DIRECTORY }}
    file: ${{ env.WORKING_DIRECTORY }}/Dockerfile.tee-fake-offline
    push: true
    tags: ${{ env.FULL_IMAGE_TAG }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

This follows the cowgear pattern (local registry service, buildx, GHA cache).

Note: Full end-to-end testing (RA-TLS attestation, actual enclave execution) requires SGX-enabled infrastructure and cannot reasonably be included in standard CI. The unit tests and build verification above provide meaningful coverage for the logic and build correctness.

The workflow should also follow the path-filtering pattern from the existing workflows (using dorny/paths-filter@v2) so it only runs when klyshko-mp-spdz-tee/** is modified, and include the *-test-status job pattern that allows the check to be marked as required in branch protection settings.

Follow-on — production image publishing and MRENCLAVE tracking:
Publishing a production image requires a stable signing key held as a GitHub Actions secret (so that MRSIGNER is consistent across releases), as well as an image registry and release workflow, all of which are infrastructure decisions on the maintainer side. This means the full CI/CD story for klyshko-mp-spdz-tee cannot be completed within this PR alone. I propose the following split:

  • This PR: unit tests (Stage 1) + build verification with a generated test key (Stage 2, no publishing).
  • Follow-on: a dedicated release workflow that builds and publishes the SGX image using a repository signing-key secret, and records the MRENCLAVE measurement (e.g. in the GitHub release notes) so operators can verify their deployed enclaves against the expected measurement.

Please open a tracking issue for the follow-on workflow to keep the discussion visible to the community. We are happy to align on the design there.

📋 Operator Changes — Backwards Compatibility

Looking at the operator changes for backwards compatibility, the --sgx-enabled flag (default: false) and the new optional Tolerations/Affinity fields on TupleGeneratorPodSpec seem non-breaking for existing non-SGX deployments: when the flag is not set, no SGX-specific scheduling constraints, resource limits, or volume mounts are injected into generator pods, and the new CRD fields carry omitempty so existing TupleGenerator resources require no changes.

One minor observation: the service annotation service.beta.kubernetes.io/port_5000_no_probe_rule: "true" on the inter-CRG service is applied unconditionally, independent of --sgx-enabled. This is benign for standard load balancer implementations but does represent a behaviour change for all deployments, not just SGX-enabled ones. Still, this should be good to go.

🔧 Blocking on Our Side — Pre-existing CI Infrastructure Issue

The operator.build-and-test workflow is currently failing on all PRs due to a pre-existing setup-envtest GCS issue (tracked in carbynestack/klyshko#99). This is unrelated to the changes in this PR and does not require any action from you.

@carbynestack/klyshko-maintainers I propose to not hold the merge on this failure indefinitely, but we do need to confirm that the operator test suite passes cleanly once the infrastructure fix is in place, specifically that the non-SGX code paths are unaffected by the changes introduced here. The existing controller tests exercise createGeneratorPod with --sgx-enabled=false (the default), so this should be straightforward to verify once the envtest issue is resolved. The SGX-enabled code path is not covered by CI by design, as it requires SGX hardware.

You could continue working on the items above in parallel. The infrastructure fix will be applied independently to the base branch.

Happy to discuss any of items above, particularly the CI workflow design, where there may be trade-offs worth aligning on given the SGX runtime constraints.

Thanks again for this outstanding contribution. 🚀

Sebastian

package v1alpha1

import (
corev1 "k8s.io/api/core/v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corev1 and v1 are two aliases for the exact same package (k8s.io/api/core/v1). The Tolerations field on line 51 uses corev1.Toleration, but since v1 already resolves to the same package, the fix is simply:

Tolerations []v1.Toleration `json:"tolerations,omitempty"`

and drop the corev1 import. This duplicate import causes controller-gen v0.6.1 to panic when the tool is compiled with Go 1.17+ (due to changes in go/types internal APIs). While CI currently uses Go 1.16 and doesn't hit the panic, this will break any contributor running a modern Go toolchain locally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corev1 k8s import removed from operator as it is redundant

SarthakSharm and others added 7 commits March 2, 2026 14:50
Signed-off-by: Sarthak Sharma <sarthak.sharma@datakaveri.org>
…urity configuration

Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
…ate placeholder files for Gramine

Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
…ockerfile

Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
@rohithvaidya rohithvaidya force-pushed the feat-sgx-integration branch from c0b1463 to 26baf02 Compare March 2, 2026 10:09
rohithvaidya and others added 10 commits March 3, 2026 12:19
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Fixes to Static Code Checks and Add Build and Test CI Pipeline
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Signed-off-by: rohithvaidya <rohith.vaidhyanathan@datakaveri.org>
Operator Import Fix and Static Code Analysis Fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants