Skip to content

Bug : Use random IV in EncryptDecryptUsingTpm#5494

Open
shjala wants to merge 4 commits intolf-edge:masterfrom
shjala:fix.EncryptDecryptUsingTpm.IV
Open

Bug : Use random IV in EncryptDecryptUsingTpm#5494
shjala wants to merge 4 commits intolf-edge:masterfrom
shjala:fix.EncryptDecryptUsingTpm.IV

Conversation

@shjala
Copy link
Member

@shjala shjala commented Dec 16, 2025

Description

265c688 Changes EncryptDecryptUsingTpm function to support versioned encryption
with proper random IV generation. Previously, the function used a static
zero IV which is insecure. This commit:

  • Adds VaultKeyEncVersion enum (Unespecified, EncryptionAEAD) to track
    encryption method versions
  • Implements AES-GCM (AEAD) encryption/decryption with random nonce
  • Maintains backward compatibility with existing encrypted data using
    Unespecified version

The new AEAD mode generates a random nonce per encryption operation and
prepends it (with its size) to the ciphertext, providing authenticated
encryption with proper IV randomness.

f32bad6 Refactor TPM tests and add encryption unit tests

  • Move GetCertHash to evetpm package for better reusability.
  • Migrate tpmmgr internal tests (testTpmEcdhSupport, testEcdhAES) to evetpm unit tests.
  • Add unit tests in encryptdecrypt_test.go covering AES GCM and TPM encryption.
  • Update TPM test setup scripts (tests/tpm/*.sh) to provision required keys and certificates.
  • Allow overriding of TPM file paths in locationconsts.go for testing purposes.

PR dependencies

None.

How to test and validate this PR

TPM Encryption Fix Validation

Ensure that device encryption keys (Vaults) continue to work correctly for new installations AND after upgrading existing devices.

1. Fresh Installation Verification

Ensure the new encryption method works on a new device.

  • Steps:
    1. Onboard a device with this new EVE image.
    2. Wait for the device to become Online in the Controller.
    3. Deploy a generic Edge Application with encrypted volume
    4. Verify:
      • Does the vault unlocks?
      • Does the App instance reach RUNNING state? (Success)
      • Does the App instance get stuck in BOOTING or downloading with errors? (Failure)
    5. Log Check:
      • Check tpmmgr logs on the device or via Controller:
        • Search for: "TPM Sanity Check"
        • Expected: No errors. If you see "failed to encrypt" or "failed to decrypt", the test failed.

2. Upgrade & Backward Compatibility

Ensure that updating an OLD device to this NEW version does not lock the user out of their existing encrypted data.

  • Steps:
    1. Flash/Install an OLD stable version of EVE (e.g., previous release) on a TPM-capable device.
    2. Onboard and Deploy an Edge App with encrypted volume.
    3. Verify the App is RUNNING.
    4. Trigger an Update to the NEW EVE image (with this PR include).
    5. Wait for the device to update and reboot.
    6. Verify:
      • Does the vault unlocks and existing App instance start up and reach RUNNING state again?
      • The old vault's keys are stored on the Controller using the OLD encryption format. The new EVE image must correctly detect and decrypt these "Legacy" keys.

Changelog notes

Fix AES-CFB encryption in EVE by using a random IV instead of a zero-initialized IV.

PR Backports

For all current LTS branches, please state explicitly if this PR should be
backported or not. This section is used by our scripts to track the backports,
so, please, do not omit it.

Here is the list of current LTS branches (it should be always up to date):

  • 14.5-stable
  • 13.4-stable

For example, if this PR fixes a bug in a feature that was introduced in 14.5,
you can write:

- 14.5-stable: To be backported.
- 13.4-stable: No, as the feature is not available there.

Also, to the PRs that should be backported into any stable branch, please
add a label stable.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

  • I've added a reference link to the original PR
  • PR's title follows the template

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

@shjala shjala self-assigned this Dec 16, 2025
@shjala shjala requested a review from rucoder as a code owner December 16, 2025 19:09
@shjala shjala added bug Something isn't working stable Should be backported to stable release(s) labels Dec 16, 2025
@shjala
Copy link
Member Author

shjala commented Dec 16, 2025

Note to self: publish an advisory after fix got merged.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the controllers do not care about the length of the encrypted blob. Need to check whether that is the case. Does it place any other requirements on the controllers?

Also, presumably the decryption needs to handle the case when there is no prepended IV and use the old zeros approach so that one can transition.
(When it does that, does it always send the new encrypted blob to the controller meaning the IV would be added on the next boot of a device?)

Last but nost least, the API proto files which describe this as an encrypted key should have their comments expanded; key thing is to not assume anything about the length of the blob since the blob may or may not include an IV (for new vs old EVE versions).

@shjala shjala force-pushed the fix.EncryptDecryptUsingTpm.IV branch from d91525a to 629eaf3 Compare December 17, 2025 12:52
@shjala
Copy link
Member Author

shjala commented Dec 17, 2025

This assumes that the controllers do not care about the length of the encrypted blob. Need to check whether that is the case. Does it place any other requirements on the controllers?

EVE-API places no restriction on the size. The implementation should threat this data as is and just stored it. Adam doesn't care about the size or content, I'll check to make sure the same is true for commercial controller.

Also, presumably the decryption needs to handle the case when there is no prepended IV and use the old zeros approach so that one can transition. (When it does that, does it always send the new encrypted blob to the controller meaning the IV would be added on the next boot of a device?)

Added fallback for backward compatibility on the decryption path.

Last but nost least, the API proto files which describe this as an encrypted key should have their comments expanded; key thing is to not assume anything about the length of the blob since the blob may or may not include an IV (for new vs old EVE versions).

I'll open a PR when this is ready to be merged.

@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 76.66667% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 30.93%. Comparing base (2281599) to head (0c1b18c).
⚠️ Report is 264 commits behind head on master.

Files with missing lines Patch % Lines
pkg/pillar/evetpm/encryptdecypt.go 77.35% 7 Missing and 5 partials ⚠️
pkg/pillar/evetpm/tpm.go 71.42% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5494       +/-   ##
===========================================
+ Coverage   19.52%   30.93%   +11.40%     
===========================================
  Files          19       18        -1     
  Lines        3021     2311      -710     
===========================================
+ Hits          590      715      +125     
+ Misses       2310     1433      -877     
- Partials      121      163       +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@justincormack
Copy link

You are making this very hard to change in the future, you should properly version the new interface. CFB encryption is no longer recommended (as it is not authenticated), and CFB modes are marked as deprectaed in Go standard library, so you are going to have to change this again, so you might as well make this easier.

@shjala
Copy link
Member Author

shjala commented Dec 17, 2025

You are making this very hard to change in the future, you should properly version the new interface. CFB encryption is no longer recommended (as it is not authenticated), and CFB modes are marked as deprectaed in Go standard library, so you are going to have to change this again, so you might as well make this easier.

I did consider that, with complete json output or magic bytes, but I need to first make sure how controller handles this filed.

@eriknordmark
Copy link
Contributor

@shjala I thought we found sufficient info about how the controllers behave to proceed with this fix.

Update eve-api to include the latest changes.

Signed-off-by: Shahriyar Jalayeri <shahriyar@posteo.de>
@shjala shjala changed the title [WIP] Bug : Use random IV in EncryptDecryptUsingTpm Bug : Use random IV in EncryptDecryptUsingTpm Jan 28, 2026
@shjala shjala force-pushed the fix.EncryptDecryptUsingTpm.IV branch from 629eaf3 to 88beb05 Compare January 28, 2026 15:17
@shjala
Copy link
Member Author

shjala commented Jan 28, 2026

@eriknordmark made some changes, please review.

@shjala shjala force-pushed the fix.EncryptDecryptUsingTpm.IV branch from 88beb05 to a755a00 Compare January 28, 2026 16:30
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but some revive and codespell issues to fix.

Changes EncryptDecryptUsingTpm function to support versioned encryption
with proper random IV generation. Previously, the function used a static
zero IV which is insecure. This commit:

- Adds VaultKeyEncVersion enum (Unespecified, EncryptionAEAD) to track
  encryption method versions
- Implements AES-GCM (AEAD) encryption/decryption with random nonce
- Maintains backward compatibility with existing encrypted data using
  Unespecified version

The new AEAD mode generates a random nonce per encryption operation and
prepends it (with its size) to the ciphertext, providing authenticated
encryption with proper IV randomness.

Signed-off-by: Shahriyar Jalayeri <shahriyar@posteo.de>
@shjala shjala force-pushed the fix.EncryptDecryptUsingTpm.IV branch from a755a00 to f32bad6 Compare January 29, 2026 10:26
@github-actions github-actions bot requested a review from eriknordmark January 29, 2026 10:26
- Move GetCertHash to evetpm package for better reusability.
- Migrate tpmmgr internal tests (testTpmEcdhSupport, testEcdhAES) to evetpm unit tests.
- Add unit tests in encryptdecrypt_test.go covering AES GCM and TPM encryption.
- Update TPM test setup scripts (tests/tpm/*.sh) to provision required keys and certificates.
- Allow overriding of TPM file paths in locationconsts.go for testing purposes.

Signed-off-by: Shahriyar Jalayeri <shahriyar@posteo.de>
@shjala shjala force-pushed the fix.EncryptDecryptUsingTpm.IV branch from f32bad6 to 026f608 Compare January 29, 2026 10:31
Increase timeout durations for stability check and inventory collection
in evaluation cycle tests.

Signed-off-by: Shahriyar Jalayeri <shahriyar@posteo.de>
@shjala
Copy link
Member Author

shjala commented Jan 29, 2026

@rucoder I increased the timeout in 0c1b18c becuse it was a bit flaky in CI.

@eriknordmark
Copy link
Contributor

@shjala seems like the two upgrade tests fail repeatedly for this PR (and we don't see any intermittent failures for those tests elsewhere).

Did you check how Eden/Adam saves the encrypted storage key?

@shjala
Copy link
Member Author

shjala commented Jan 29, 2026

@shjala seems like the two upgrade tests fail repeatedly for this PR (and we don't see any intermittent failures for those tests elsewhere).

Did you check how Eden/Adam saves the encrypted storage key?

I'm working on something to make running Eden on local branch a bit easier, I will test both PRs with it and fix any remaining issues.

@eriknordmark
Copy link
Contributor

eriknordmark commented Jan 29, 2026

Running against the commercial controller I see failures. The system updates fine to run the image in this PR, and as part of that it saves the encrypted key in the controller.
But the tests which try the EVE image update (by going to e.g., current master build) fail with
ERROR: device: baseos_updating attest: InternalEscrowWait error [ATTEST] No escrow data vault: ERROR error Vault key unavailable

Maybe this is a constraint but it means there will be issues if update to a newer version is done, and it saves the key in the new format with an IV, and then the system crashes during its first 10 minutes and we fallback to the previous version.
That previous version does not know to extract the IV from the new format so it fails??

Do we need to introduce this using two steps?

  1. Make EVE version N be able to parse the new format and use the IV
  2. Wait for weeks or months
  3. Make EVE version N+k save the key with a non-empty IV

@shjala
Copy link
Member Author

shjala commented Jan 30, 2026

@eriknordmark I checked the code, the eve upgrade test in Eden has been a EVE downgrade for a while (or always? how can we upgrade from a PR to PR+1), it always upgrades to version "12.1.0", and the test you ran with commercial controller is technically a downgrade too.

Yes, the problem is that if we "upgrade" to an older version, it can't decrypt the new version, but new version can decrypt the old version.

Maybe this is a constraint but it means there will be issues if update to a newer version is done, and it saves the key in the new format with an IV, and then the system crashes during its first 10 minutes and we fallback to the previous version.
That previous version does not know to extract the IV from the new format so it fails??

OK, I have to check in which order we do things, but I (foolishly) expect this to only happen if the previous version was in vault locked state, otherwise reverting to old version it should be able to unlock the vault and submit it's key to the controller during the attestation (or will it get the key from the controller first?), will the controller update the key every time it is received or ignores it if it is already set (even with different format)?

@shjala
Copy link
Member Author

shjala commented Jan 30, 2026

Do we need to introduce this using two steps?

Let me think about it a bit, but maybe we have to.

@eriknordmark
Copy link
Contributor

Note that installing this PR on a device is toxic in that you can't later install anything else (like master).
You have to manually blow away at least /persist/vault if not all of /persist as part of the update.

@shjala
Copy link
Member Author

shjala commented Feb 2, 2026

@eriknordmark What do you think about this strategy :

Boot 1 : EVE is potentially unstable (PartitionState = inprogress), encrypt using AES-ZeroIV (LEGACY) regardless of what EVE version is running, Controller receives legacy data. If rollback happens, Controller has LEGACY data that every EVE version can understand.

Boot 2 : EVE is stable (PartitionState = active), encrypt using AEAD, Controller receives new data. Rollback is not going to happen and we are committed to the latest version.

This relies on no disk write and we don't have to worry about disk fails. We can indicate in the release notes that upgrading to this version makes downgrades not only officially but also technically impossible without losing vault data.

@eriknordmark
Copy link
Contributor

@eriknordmark What do you think about this strategy :

Boot 1 : EVE is potentially unstable (PartitionState = inprogress), encrypt using AES-ZeroIV (LEGACY) regardless of what EVE version is running, Controller receives legacy data. If rollback happens, Controller has LEGACY data that every EVE version can understand.

Boot 2 : EVE is stable (PartitionState = active), encrypt using AEAD, Controller receives new data. Rollback is not going to happen and we are committed to the latest version.

This relies on no disk write and we don't have to worry about disk fails. We can indicate in the release notes that upgrading to this version makes rollback not only officially but also technically impossible without losing vault data.

That sounds workable and simple enough.

Please make sure you put the release note text in the PR description, and let me know when you have updated the code so we can run this through the tests.

We also need to check how we test things - in eden and in Zededa test setups to make sure we only test upgrades (from LTS releases to current PR or current master/release) and not try to move back and forth across releases.
Can you check how Eden does the EVE update tests?

@eriknordmark
Copy link
Contributor

Plus some conflicts to resolve when you evolve the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working security Provides a security fix stable Should be backported to stable release(s)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants