Skip to content

nvmeof: Add mount cache and locking for safe nvme disconnect#6192

Open
gadididi wants to merge 3 commits intoceph:develfrom
gadididi:nvmeof/add_mnt_cache_nodeserver
Open

nvmeof: Add mount cache and locking for safe nvme disconnect#6192
gadididi wants to merge 3 commits intoceph:develfrom
gadididi:nvmeof/add_mnt_cache_nodeserver

Conversation

@gadididi
Copy link
Copy Markdown
Contributor

@gadididi gadididi commented Mar 22, 2026

Describe what this PR does

Adds an in-memory cache to track which nvme devices are currently mounted at staging paths. This eliminates expensive findmnt --list system calls on every unstage operation.

Structure:
The cache maintains bidirectional maps for fast lookups:

  • pathToDevice: staging path -> device path
  • deviceToPath: device path -> staging path (reverse)

Initialization:
Cache is populated at node server startup by scanning existing mounts, then kept up to date as volumes are staged and unstaged.

Thread Safety:
Protected by sync.Mutex. We use a simple mutex instead of sync.RWMutex because:

  • Operations are very fast ( map lookups or remove. just 1 call for get copy , which it is called once per delete pod.. )
  • Read/write ratio is - not read-heavy

Why copying the map is safe:
When GetAllDevices() returns a copy of the cache, callers get a consistent snapshot at that moment. The copy happens after cache updates (first, the pod unmounts, then deletes itself, and last fetch the copy of cache), so concurrent unstage operations each see current state. Even if two unstages run simultaneously, each removes its device from the cache first, then gets a fresh snapshot showing remaining devices.
So the last of them who call to GetAllDevices() will get the update list, and will know if disconnect or not. (NodeStage cannot run at this time !!)

Related PR##

after this pr: #6183 will be merged, group locking (between NodeStage\NodeUnStage) will added.

Future concerns

next step is to add group lock after #6183 will be merged.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@gadididi gadididi requested a review from nixpanic March 22, 2026 15:03
@gadididi gadididi self-assigned this Mar 22, 2026
@gadididi gadididi added the component/nvme-of Issues and PRs related to NVMe-oF. label Mar 22, 2026
@gadididi gadididi requested review from Rakshith-R and Copilot March 23, 2026 06:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-memory mount cache for NVMe-oF staging mounts so NodeUnstage can avoid repeated findmnt --list calls and decide when it’s safe to disconnect NVMe controllers.

Changes:

  • Changed GetAllNVMeMountedDevices() to return device -> stagingTarget mappings (instead of a bool set).
  • Introduced MountCache (bidirectional device/path cache) and initialized it at NodeServer startup.
  • Updated NodeUnstage disconnect decision flow to use the in-memory cache snapshot.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
internal/nvmeof/util/mounter.go Changes mounted-device discovery to return device→mountpoint mapping.
internal/nvmeof/util/mounter_cache.go Adds new mutex-protected mount cache with bidirectional maps.
internal/nvmeof/nvmeof_initiator.go Updates disconnect interfaces/helpers to accept the new mounted-devices map type.
internal/nvmeof/nodeserver/nodeserver.go Initializes cache on startup; updates stage/unstage flows to maintain and consult the cache for disconnect decisions.

Comment on lines 515 to +525
transaction.isMounted = true

// Resolve real device and update cache
realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)
if err != nil {
log.WarningLog(ctx, "failed to resolve device: %v", err)
// Fallback - try to continue
realDevice = devicePath
}

ns.mountCache.Add(realDevice, stagingTargetPath)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between the successful mount (transaction.isMounted = true) and the later ns.mountCache.Add call, there is a window where the namespace is mounted in the system but missing from mountCache. Because NodeStage/NodeUnstage are only locked per-volume (volumeLocks), a concurrent unstage of a different namespace in the same subsystem could take a snapshot that omits this mount and incorrectly disconnect shared controllers. To avoid false negatives, consider updating mountCache immediately after the mount succeeds (using the best-available device identity) or adding a safety validation (e.g., re-check mounts via findmnt) before disconnect decisions.

Suggested change
transaction.isMounted = true
// Resolve real device and update cache
realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)
if err != nil {
log.WarningLog(ctx, "failed to resolve device: %v", err)
// Fallback - try to continue
realDevice = devicePath
}
ns.mountCache.Add(realDevice, stagingTargetPath)
// Immediately record the mount in the cache using the best-available device identity
ns.mountCache.Add(devicePath, stagingTargetPath)
// Resolve real device and refine cache entry if possible
realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)
if err != nil {
log.WarningLog(ctx, "failed to resolve device: %v", err)
// Fallback - keep using devicePath for cache
} else if realDevice != "" && realDevice != devicePath {
// Update cache to use the canonical device identifier for this mountpoint
ns.mountCache.RemoveByMountPoint(stagingTargetPath)
ns.mountCache.Add(realDevice, stagingTargetPath)
}
transaction.isMounted = true

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because NodeStage/NodeUnstage are only locked per-volume (volumeLocks), a concurrent unstage of a different namespace in the same subsystem could take a snapshot that omits this mount and incorrectly disconnect shared controllers

As I said in the PR description, in the next PR I will add group lock(which locks per group type stage\unstage) . Once this PR will be merged: #6183

@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 8673fba to 077fce8 Compare March 24, 2026 14:22
@gadididi gadididi requested a review from Copilot March 24, 2026 14:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch from 077fce8 to 9558efb Compare March 25, 2026 07:46
// MountCache is a thread-safe cache that maintains the mapping between staging paths and devices.
// Note: This map is 1:1, meaning each staging path corresponds to exactly one device.
// device can only be mounted to one staging path at a time, and each staging path can only have one device.
type MountCache struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a MountCache interface and mountCache struct that contains the actual implementation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// Initialize mounted devices cache on startup to ensure we have an accurate view
mountedDevices, err := getNVMeMountedDevices(context.Background())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make getNVMeMountedDevices() a function of NodeServer (call it initNVMeMountedDevices()?) and add the path/device directly in that function. This removes the for-loop a little below and makes this function a little smaller.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 5e46539 to 6d03f95 Compare March 30, 2026 09:31
@gadididi gadididi requested a review from nixpanic March 30, 2026 09:31
// so we should try to disconnect if this was the last mount
if transaction.devicePath != "" {
mountedDevices, err := getNVMeMountedDevices(ctx)
mountedDevices := ns.mountCache.GetCopyAllDevices()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now does not return err anymore, the check for that on the next line does not make sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, my bad

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 53bba11 to 2fe4143 Compare March 30, 2026 11:59
@gadididi gadididi requested a review from nixpanic March 30, 2026 12:23
Add MountCache to track nvme device mounts in memory, eliminating
repeated `findmnt` system calls during unstage operations.
The cache maintains a bidirectional map between
staging paths and device paths, allowing O(1)
lookups in both directions.

also the get device from staging path call is saved
from `findmnt` sys call.

Structure:
- pathToDevice: staging path -> device (/staging/vol-123 → /dev/nvme0n1)
- deviceToPath: device -> staging path (reverse lookup)
- Protected by sync.Mutex for thread-safe concurrent access

The cache will be initialized at node server startup by scanning existing
mounts, then updated on each stage/unstage operation.
This improves performance by replacing expensive findmnt --list calls
(scanning all mounts) with fast in-memory map lookups.

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
change the map from (staging path -> true\false) map
to more informative map- staging path -> device.
this function will be in used at node server init.
scan the current stage of mounted nvme device.

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
adding `mountCache`, which saves the findmnt calls.
aslo change in the code any place we call to `findmnt`
to call to cache get.

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch from 2fe4143 to 0803810 Compare March 30, 2026 12:23
@gadididi gadididi requested a review from Copilot March 30, 2026 12:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines +118 to +121
// Initialize mounted devices cache on startup to ensure we have an accurate view
if err := ns.initNVMeMountedDevices(context.Background()); err != nil {
log.WarningLog(context.Background(),
"failed to initialize mounted devices cache on startup: %v (mounted devices cache will be empty)", err)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewNodeServer continues when initNVMeMountedDevices() fails, leaving mountCache empty. In that state, NodeUnstageVolume will make disconnect decisions using an incomplete snapshot and may disconnect shared controllers while other NVMe namespaces remain mounted (especially after node-plugin restart with pre-existing mounts). Consider either failing NodeServer startup on init error, or tracking an "initialized" flag and falling back to an authoritative findmnt scan for disconnect decisions until initialization succeeds (or retry initialization).

Suggested change
// Initialize mounted devices cache on startup to ensure we have an accurate view
if err := ns.initNVMeMountedDevices(context.Background()); err != nil {
log.WarningLog(context.Background(),
"failed to initialize mounted devices cache on startup: %v (mounted devices cache will be empty)", err)
// Initialize mounted devices cache on startup to ensure we have an accurate view.
// Failing to initialize this cache can lead to incorrect disconnect decisions, so
// treat initialization failures as fatal and fail NodeServer construction.
if err := ns.initNVMeMountedDevices(context.Background()); err != nil {
return nil, fmt.Errorf("failed to initialize mounted devices cache on startup: %w", err)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/nvme-of Issues and PRs related to NVMe-oF.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants