Support graceful script isolation fallback when filesystem does not support locking#1838
Support graceful script isolation fallback when filesystem does not support locking#1838
Conversation
dea3c2a to
9ce1b18
Compare
gb-8
left a comment
There was a problem hiding this comment.
I've had a scan of the PR and while the approach seems good, and the test coverage seems good, I did find the code hard to follow.
I had to really work to follow the logic through, and see exactly where the determination of locking support was performed.
The logic is all hidden inside things that don't make it clear what they are doing.
Once you are happy that the functionality is all correct, then I'd like to pair on seeing if we can refactor to make it more readable and understandable.
source/Calamari.Common/Features/Processes/ScriptIsolation/LockCapability.cs
Outdated
Show resolved
Hide resolved
|
|
||
| public enum LockCapability | ||
| { | ||
| Unknown, |
There was a problem hiding this comment.
I think it is good practice to set the default value explicitly to 0.
There was a problem hiding this comment.
Done (Unsupported is now the default). I was toying with the idea of using [Flags], but I didn't see much benefit to it.
source/Calamari.Common/Features/Processes/ScriptIsolation/TemporaryDirectoryFallbackProvider.cs
Show resolved
Hide resolved
1ccc39e to
d873503
Compare
66e4146 to
8480e7e
Compare
8480e7e to
ac6d515
Compare
|
Notes from our discussion: How we are actually solving the isolation: Possible support levels:
Possible requested isolation levels:
For phase two, we have a known support level, and a requested isolation level. Algorithm:
Expected implementation Get the requested locking data:
Thoughts on refactoring:
|
…quire delegate CachedDriveInfo is extracted from its nested position inside LockDirectory into its own file. DetectLockSupport gains an overload that accepts a Func<LockOptions, ILockHandle> delegate in place of the direct FileLock.Acquire call, making the four lock-probe tests (exclusive, shared, exclusive-blocks-shared, shared-blocks-exclusive) unit-testable without real filesystem operations. The existing public overload forwards to FileLock.Acquire.
MountedDrives is extracted from its nested position inside LockDirectory into its own file with no logic changes.
The public GetLockDirectory(string) now delegates to an internal overload that accepts a MountedDrives instance and an optional acquire delegate. This allows tests to supply controlled drive lists and lock-acquisition behaviour without real filesystem probing. The refactor also closes a design gap: GetAssociatedDrive calls that throw DirectoryNotFoundException (e.g. when MountedDrives is empty or a path does not sit under any known drive root) are now caught, and the affected path is treated as having no known capability rather than propagating the exception.
The fallback decision logic (routing between IsFullySupported, IsSupported, PromoteToExclusiveLock, and returning null) is separated into an internal static ResolveLockOptions(LockOptions, bool) method. PrepareLockOptions becomes a thin wrapper that builds the options, logs, then delegates. This makes all fallback branches unit-testable by constructing a LockOptions with a controlled LockCapability, without any filesystem access.
…metic tests GetLockDirectory tests against non-existent paths (e.g. /home/octopus/tentacle) were failing because DetectLockSupport called Directory.CreateDirectory unconditionally before probing, causing the outer catch to return Unsupported instead of running the fake lock service. Thread a createDirectory Action through GetLockDirectory and DetectLockSupport (defaulting to the real implementation), and pass a no-op in the two affected Group D fixture tests.
The two delegates (acquireDelegate and createDirectory) passed through GetLockDirectory and DetectLockSupport were always used together and represented a single concern: interacting with the filesystem during lock-support probing. Introducing IFileLockService makes that coupling explicit and gives tests a single seam to inject. - Add IFileLockService with AcquireLock and CreateDirectory - Add FileLockService (singleton) as the real implementation - Collapse the multi-overload chain in CachedDriveInfo.DetectLockSupport and LockDirectory.GetLockDirectory to a single IFileLockService parameter - FakeLockService in LockDirectoryFixture now implements IFileLockService (CreateDirectory is a no-op; Acquire renamed to AcquireLock)
…oryFallback Move GetTemporaryCandidates logic into a new ITemporaryDirectoryFallback interface with real implementation TemporaryDirectoryFallback (singleton). This allows the internal GetLockDirectory overload to accept an injectable fallback, making tests hermetic and independent of environment variables ($TMPDIR) and filesystem existence checks (/tmp, /dev/shm). Also fix bug: the /dev/shm candidate was incorrectly joined under /tmp instead of /dev/shm. Add FakeTemporaryDirectoryFallback to tests returning a fixed list of candidates. All Group D tests now inject this fake, removing any dependency on $TMPDIR, /tmp, and /dev/shm. Add 2 new Group D tests: - Verify temp candidates with no matching drive are silently skipped - Verify multiple temp candidates: the first Supported one is chosen All 89 tests pass.
Consolidate ITemporaryDirectoryFallback into IFileLockService by adding GetFallbackTemporaryDirectories. This removes the separate interface and its injectable seam, simplifying the design: callers only need to inject one service. TemporaryDirectoryFallback becomes a static helper used exclusively by FileLockService. Tests updated to use FakeLockService with temp directory params instead of the now-deleted FakeTemporaryDirectoryFallback.
190a711 to
9eeeb25
Compare
|
|
||
| namespace Calamari.Common.Features.Processes.ScriptIsolation; | ||
|
|
||
| public sealed class LockOptionsFactory( |
There was a problem hiding this comment.
This factory has a bunch of logic and intelligence in it, so maybe should be renamed to communicate that it is doing more than simple constructions.
E.g. LockOptionResolver or something.
There was a problem hiding this comment.
Renaming to LockOptionsResolver
| return UseExclusiveIfSharedIsNotSupported(lockOptions); | ||
| } | ||
|
|
||
| internal LockOptions? UseExclusiveIfSharedIsNotSupported(LockOptions lockOptions) |
There was a problem hiding this comment.
This can also "downgrade" rather than "escalate" the lock you actually get, so a better name would be:
DetermineActualKindOfLockWeWillUseBasedOnSupport (or something better).
There was a problem hiding this comment.
Renaming to DetermineActualLockTypeToUseBasedOnSupport and refactoring to make things clearer.
| .Build(); | ||
| } | ||
|
|
||
| LockOptions? PrepareLockOptions(CommonOptions.ScriptIsolationOptions scriptIsolationOptions) |
There was a problem hiding this comment.
Better (but not great) name:
DetermineWhatLockToUseBasedOnOptions
The key thing to communicate is that we are making decisions inside here.
There was a problem hiding this comment.
I ended up with DeterminLockOptionsToEnforceBasedOnIsolationOptions
| ILog log | ||
| ) | ||
| { | ||
| public RequestedLockOptions? CreateOrNull(CommonOptions.ScriptIsolationOptions options) |
There was a problem hiding this comment.
I think we have settled on:
CreateFromIsolationOptions()
There was a problem hiding this comment.
Will rename to CreateFromIsolationOptions
gb-8
left a comment
There was a problem hiding this comment.
✅ Pre-approval, with various renaming suggested, and some a suggested refactor of LockOptionsFactory.UseExclusiveIfSharedIsNotSupported.
We will likely want to write out service messages each time we escalate/deescalate level of locking do we can write warnings in server logs that we can monitor or alert on. That can go into this PR too or a follow up.
- Rename UseExclusiveIfSharedIsNotSupported to DetermineActualLockTypeToUseBasedOnSupport for better semantic clarity - Extract lock type support checks into explicit local variables (isExclusiveLockSupported, isSharedLockSupported) - Restructure conditionals to separate exclusive vs shared lock type handling paths - Add comments explaining the escalation behavior when shared locks are unavailable - Extract repeated warning message into LogUnableToSupportAnyScriptIsolation helper method - Update test references to use the renamed method This refactoring improves code readability without changing the functional behavior.
…nOptions Improves method naming to better express the semantic intent: the method creates requested lock options specifically from script isolation options. This clarifies the relationship between input and output compared to the generic CreateOrNull name.
…ationOptions for clarity
rhysparry
left a comment
There was a problem hiding this comment.
I just want to break out the service message details first because I'd like to enrich the message with details of what we tried.
|
|
||
| namespace Calamari.Common.Features.Processes.ScriptIsolation; | ||
|
|
||
| public sealed class LockOptionsFactory( |
There was a problem hiding this comment.
Renaming to LockOptionsResolver
| return UseExclusiveIfSharedIsNotSupported(lockOptions); | ||
| } | ||
|
|
||
| internal LockOptions? UseExclusiveIfSharedIsNotSupported(LockOptions lockOptions) |
There was a problem hiding this comment.
Renaming to DetermineActualLockTypeToUseBasedOnSupport and refactoring to make things clearer.
| ILog log | ||
| ) | ||
| { | ||
| public RequestedLockOptions? CreateOrNull(CommonOptions.ScriptIsolationOptions options) |
There was a problem hiding this comment.
Will rename to CreateFromIsolationOptions
| .Build(); | ||
| } | ||
|
|
||
| LockOptions? PrepareLockOptions(CommonOptions.ScriptIsolationOptions scriptIsolationOptions) |
There was a problem hiding this comment.
I ended up with DeterminLockOptionsToEnforceBasedOnIsolationOptions
Script isolation lock directory fallback for filesystems without full locking support
Motivation
Calamari's script isolation mechanism uses filesystem file locks to serialise or permit concurrent script execution. The lock file has historically been written into the Tentacle working directory (
TentacleHome).On some deployment targets the working directory resides on a filesystem that does not fully support file locking semantics — NFS mounts, certain FUSE-based filesystems, and some SMB shares being common examples. On these filesystems, lock acquisition either silently succeeds when it should block, or throws errors that are not meaningful to the caller. The result is that script isolation is silently broken: scripts that should be mutually exclusive may run concurrently, or scripts that should run in parallel are incorrectly serialised (or errored).
This PR introduces runtime filesystem lock capability detection and an automatic fallback to a temporary directory that does support locking, so that script isolation remains as strong as the host OS allows — without requiring any manual configuration from operators.
What changed
New abstractions
ScriptIsolation/LockCapability.csUnsupported,ExclusiveOnly,Supported; unknown capability represented asnull(LockCapability?)ScriptIsolation/LockFile.csLockDirectorywith a specificFileInfo; exposesOpen,Supports,Exists, andDelete, and owns the nestedLockHandleimplementationScriptIsolation/LockDirectory.csDirectoryInfo+ its detectedLockCapability; exposesGetLockFileandSupports(LockType)ScriptIsolation/LockDirectoryFactory.cs+ILockDirectoryFactory.csCreate) and the live 4-probeDetectLockSupportlogic; injected withIMountedDrivesProvider,IFileLockService, andITemporaryDirectoryFallbackProviderScriptIsolation/CachedDriveInfo.csLockSupportreturns a statically-knownLockCapability?based on format/drive-type heuristics, ornullwhen the drive is unrecognised (triggering live detection)ScriptIsolation/MountedDrives.cs+ICachedDriveInfoProvider.csICachedDriveInfoProviderScriptIsolation/SystemMountedDrivesProvider.cs+IMountedDrivesProvider.csDriveInfo.GetDrives()and constructs aMountedDrivesinstance — separates OS enumeration from path-matching logicScriptIsolation/TemporaryDirectoryFallbackProvider.cs+ITemporaryDirectoryFallbackProvider.csScriptIsolation/RequestedLockOptions.cs+RequestedLockOptionsFactory.csCommonOptions.ScriptIsolationOptionsScriptIsolation/LockOptionsFactory.csLockOptionsby resolving the lock directory viaILockDirectoryFactoryand applying the degradation policy (UseExclusiveIfSharedIsNotSupported) — always promotes a shared lock to exclusive when shared locking is unavailableScriptIsolation/ScriptIsolationEnforcer.cs+IScriptIsolationEnforcer.csIsolationclass) that orchestrates lock acquisition viaRequestedLockOptionsFactoryandLockOptionsFactoryScriptIsolation/ScriptIsolationModule.csModulethat registers all script-isolation services (ScriptIsolationEnforcer,LockDirectoryFactory,RequestedLockOptionsFactory,LockOptionsFactory, providers, etc.)ScriptIsolation/LockAcquisitionResiliencePipelineBuilder.csLockOptionsto keep the record type focusedScriptIsolation/IFileLockService.cs+FileLockService.csCreateDirectoryandAcquireLock— enables hermetic unit testsScriptIsolation/IPathResolutionService.cs+DefaultPathResolutionService.cs+PathResolutionServiceExtensions.csPath.GetFullPathand symlink resolution — necessary to correctly match paths on macOS where/tmp → /private/tmpModified files
ScriptIsolation/LockOptions.csIsFullySupported/IsSupportedrenamed toBothSharedAndExclusiveAreSupported/RequestedLockTypeIsSupported;FromScriptIsolationOptionsOrNullandBuildLockAcquisitionPipelineremoved (moved to dedicated factories/builder)ScriptIsolation/FileLock.csLockFilerecord shapePlumbing/Commands/CommonOptions.csscriptIsolationSharedFallbackCLI option andPromoteToExclusiveLockWhenSharedLockUnavailableproperty removed — lock promotion is now unconditionalRemoved files
ScriptIsolation/Isolation.csScriptIsolationEnforcer(injectable) +IScriptIsolationEnforcerDecision trees
Phase 1 — Lock directory selection (
LockDirectoryFactory.Create)flowchart TD Start([Candidate path]) --> MapDrive[Map candidate to its drive] MapDrive --> CheckHeuristic{drive?.LockSupport?} CheckHeuristic -- Supported --> ReturnCandidate([Return candidate\nLockCapability.Supported]) CheckHeuristic -- null no drive or unknown format --> ProbeCandidateDrive CheckHeuristic -- ExclusiveOnly or Unsupported --> IterateFallbacks ProbeCandidateDrive[DetectLockSupport on candidate path] ProbeCandidateDrive --> CandidateProbeResult{Candidate\ndetected capability?} CandidateProbeResult -- Supported --> ReturnCandidate2([Return candidate\nLockCapability.Supported]) CandidateProbeResult -- ExclusiveOnly or Unsupported or null --> IterateFallbacks IterateFallbacks[Iterate temp fallback directories] IterateFallbacks --> HasTemp{More temp\ncandidates?} HasTemp -- No --> AnyExclusiveTemp HasTemp -- Yes --> MapTempDrive[Map temp dir to its drive] MapTempDrive --> CheckTempHeuristic{tempDrive?.LockSupport?} CheckTempHeuristic -- Supported --> ReturnTemp([Return temp dir\nLockCapability.Supported]) CheckTempHeuristic -- null no drive or unknown format --> ProbeTemp[DetectLockSupport on temp path] CheckTempHeuristic -- ExclusiveOnly --> RecordFirstExclusive[Record as best-so-far\nif none recorded] ProbeTemp --> TempResult{Temp capability?} TempResult -- Supported --> ReturnTemp TempResult -- ExclusiveOnly --> RecordFirstExclusive TempResult -- Unsupported or null --> HasTemp RecordFirstExclusive --> HasTemp AnyExclusiveTemp{Any ExclusiveOnly\ntemp found?} AnyExclusiveTemp -- Yes, and candidate is ExclusiveOnly --> ReturnCandidate3([Return candidate\nLockCapability.ExclusiveOnly]) AnyExclusiveTemp -- Yes, and candidate is Unsupported/null --> ReturnBestTemp([Return best temp dir\nLockCapability.ExclusiveOnly]) AnyExclusiveTemp -- No --> ReturnUnsupported([Return candidate\nLockCapability.Unsupported])Phase 2 — Lock options resolution (
LockOptionsFactory.UseExclusiveIfSharedIsNotSupported)flowchart TD Start([LockOptions\nType + LockDirectory capability]) --> FullySupported{BothSharedAndExclusiveAreSupported?\nShared + Exclusive work\ncorrectly} FullySupported -- Yes --> ReturnOriginal([Use requested lock\nNo warning]) FullySupported -- No --> IsSupported{RequestedLockTypeIsSupported?\nRequested LockType\nis supported} IsSupported -- Yes, Exclusive requested --> ReturnExclusive([Use Exclusive lock\nNo warning]) IsSupported -- No, Shared requested\non ExclusiveOnly dir --> ExclusiveAvail{Supports\nExclusive?} ExclusiveAvail -- Yes --> PromoteToExclusive([Promote to Exclusive lock\nWarn: shared unavailable]) ExclusiveAvail -- No --> NoLockUnsupported([No lock acquired\nWarn: no isolation available])Areas especially relevant for review
LockDirectoryFactory.DetectLockSupport— live filesystem probingFour sequential probe operations are run against a temporary
.tmpfile in the lock directory to empirically verify lock semantics. The probes useTimeSpan.Zerotimeout so they fail fast if the filesystem rejects the operation. The test file is always cleaned up in afinallyblock. Any exception during probing is treated conservatively asUnsupported. Reviewers should check whether the probe-file cleanup is sufficient and whether theTimeSpan.Zerostrategy is reliable across all target filesystems.PathResolutionServiceExtensions.ResolvePath— ancestor-walk symlink resolutionThe ancestor-walk is necessary because a lock directory might not exist yet at the time of drive matching (e.g. first ever run). The algorithm peels off non-existent path segments from the end, resolves the nearest existing ancestor's symlink, then re-attaches the original tail. This is especially important on macOS where
Path.GetFullPath("/tmp/x")returns/tmp/xbut the drive is mounted at/private/tmp. All fiveGetFullPathexception types are caught and cause the input to be returned unchanged.MountedDrives.GetAssociatedDrive— longest-prefix mount matchingThe match is done on the symlink-resolved path, with platform-appropriate case sensitivity (
OrdinalIgnoreCaseon Windows/macOS,Ordinalon Linux). Reviewers should verify theStartsWithprefix check correctly handles the case where the drive root IS the path (no trailing separator needed sinceRootDirectory.FullNameends with a separator on all .NET platforms).TemporaryDirectoryFallbackProvider.GetFallbackCandidates— platform-specific candidate orderOn Linux,
/dev/shmis included as a candidate after/tmp.tmpfs-backed/dev/shmis one of the most reliably lock-capable paths on systems where the working directory is NFS-mounted. The candidates are namespace-scoped using the last segment of the preferred directory path to prevent cross-deployment lock file collisions.LockOptionsFactory.UseExclusiveIfSharedIsNotSupported— unconditional promotion policyWhen a shared lock is requested but only exclusive locking is supported, the lock is always promoted to exclusive and a warning is logged. There is no longer a user-configurable flag to skip this promotion. If no locking at all is supported,
nullis returned and a warning is logged.Test coverage
LockDirectoryFixture— 1,084 lines, all groups taggedRunOnceOnWindowsAndLinuxAll tests in this fixture use injected fakes (
FakeLockService,FakePathResolutionService,FakeMountedDrivesProvider,FakeTemporaryDirectoryFallbackProvider) and run entirely in-memory — no real filesystem access.LockDirectory.SupportsLockCapability×LockTypecombinations (5 cases)CachedDriveInfo.LockSupportstaticapfs,ntfs,ext4, etc.) →Supported; network drives →null;DetectedLockSupportoverrides (13 cases)LockDirectoryFactory.DetectLockSupportUnsupported, three variants ofExclusiveOnly(broken shared, broken exclusive-blocks-shared, broken shared-blocks-exclusive), andSupported(5 cases)LockDirectoryFactory.CreateSupported; preferrednull→ detectedSupportedbefore temps inspected (preferred over aSupportedtemp); preferredUnsupported→Supportedtemp used; preferrednull→ detectedSupportedbeforeExclusiveOnlytemps inspected; both preferred and temp areExclusiveOnly→ stay on preferred; tempExclusiveOnly+ preferredUnsupported→ use temp; nothing works →Unsupportedon preferred; empty drive table → live detection runs (succeeds with fully-supported lock service); temp with no associated drive skipped; firstSupportedtemp wins when multiple candidatesMountedDrives.GetAssociatedDriveDirectoryNotFoundExceptionwhen no match; ancestor symlink resolved when child doesn't existDefaultPathResolutionServicePathResolutionServiceExtensions.ResolvePathGetFullPath, allGetFullPathexception types caught (5 parameterised)IsolationFixture— integration tests via Autofac containerThe fixture builds a real
IContainerusingScriptIsolationModuleand exercisesIScriptIsolationEnforcerend-to-end against the local filesystem.Enforce/EnforceAsyncwith valid options acquires a lock file; releases on dispose; throwsLockRejectedExceptionon contention; allows concurrent shared locks; waits for lock release; enforces timeout and throws after it elapsesLockOptionsFixture— unit tests for factory classesRequestedLockOptionsFactorytestsnullfor missing/whitespace Level, MutexName, or TentacleHome; mapsFullIsolation→ExclusiveandNoIsolation→Shared; case-insensitive level parsing; defaults timeout toInfiniteon invalid or missing input; throws on invalid mutex name charsLockOptionsFactorytestsCreateLockOptionsOrNull_BuildsCorrectLockFileNameasserts onLockFile.File.Name;UseExclusiveIfSharedIsNotSupported— 5 cases:Supported→original returned;ExclusiveOnly+Exclusive→original returned, no warning;ExclusiveOnly+Shared→promoted to Exclusive + warning;Unsupported+Exclusive→null + warning;Unsupported+Shared→null + warningWhat is not covered by unit tests
LockDirectoryFactory.Createagainst a real NFS or FUSE mount. The live probing logic is verified against a real (local) filesystem viaIsolationFixtureand Group F inLockDirectoryFixture, but the specific scenario of a network filesystem detecting asUnsupportedand falling back to/tmpis exercised only through the fakes in Group D.%LOCALAPPDATA%\Calamari,%TEMP%) are not covered by integration tests.