nvmeof: Add mount cache and locking for safe nvme disconnect by gadididi · Pull Request #6192 · ceph/ceph-csi

gadididi · 2026-03-22T15:03:53Z

Describe what this PR does

Adds an in-memory cache to track which nvme devices are currently mounted at staging paths. This eliminates expensive findmnt --list system calls on every unstage operation.

Structure:
The cache maintains bidirectional maps for fast lookups:

pathToDevice: staging path -> device path
deviceToPath: device path -> staging path (reverse)

Initialization:
Cache is populated at node server startup by scanning existing mounts, then kept up to date as volumes are staged and unstaged.

Thread Safety:
Protected by sync.Mutex. We use a simple mutex instead of sync.RWMutex because:

Operations are very fast ( map lookups or remove. just 1 call for get copy , which it is called once per delete pod.. )
Read/write ratio is - not read-heavy

Why copying the map is safe:
When GetAllDevices() returns a copy of the cache, callers get a consistent snapshot at that moment. The copy happens after cache updates (first, the pod unmounts, then deletes itself, and last fetch the copy of cache), so concurrent unstage operations each see current state. Even if two unstages run simultaneously, each removes its device from the cache first, then gets a fresh snapshot showing remaining devices.
So the last of them who call to GetAllDevices() will get the update list, and will know if disconnect or not. (NodeStage cannot run at this time !!)

Related PR##

after this pr: #6183 will be merged, group locking (between NodeStage\NodeUnStage) will added.

Future concerns

next step is to add group lock after #6183 will be merged.

Checklist:

Commit Message Formatting: Commit titles and messages follow
guidelines in the developer
guide.
Reviewed the developer guide on Submitting a Pull
Request
Pending release
notes
updated with breaking and/or notable changes for the next major release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

/retest ci/centos/<job-name>: retest the <job-name> after unrelated
failure (please report the failure too!)

Copilot

Pull request overview

Adds an in-memory mount cache for NVMe-oF staging mounts so NodeUnstage can avoid repeated findmnt --list calls and decide when it’s safe to disconnect NVMe controllers.

Changes:

Changed GetAllNVMeMountedDevices() to return device -> stagingTarget mappings (instead of a bool set).
Introduced MountCache (bidirectional device/path cache) and initialized it at NodeServer startup.
Updated NodeUnstage disconnect decision flow to use the in-memory cache snapshot.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
internal/nvmeof/util/mounter.go	Changes mounted-device discovery to return device→mountpoint mapping.
internal/nvmeof/util/mounter_cache.go	Adds new mutex-protected mount cache with bidirectional maps.
internal/nvmeof/nvmeof_initiator.go	Updates disconnect interfaces/helpers to accept the new mounted-devices map type.
internal/nvmeof/nodeserver/nodeserver.go	Initializes cache on startup; updates stage/unstage flows to maintain and consult the cache for disconnect decisions.

internal/nvmeof/util/mounter_cache.go

Copilot · 2026-03-23T07:34:13Z

internal/nvmeof/nodeserver/nodeserver.go

 	transaction.isMounted = true

+	// Resolve real device and update cache
+	realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)
+	if err != nil {
+		log.WarningLog(ctx, "failed to resolve device: %v", err)
+		// Fallback - try to continue
+		realDevice = devicePath
+	}
+
+	ns.mountCache.Add(realDevice, stagingTargetPath)


Between the successful mount (transaction.isMounted = true) and the later ns.mountCache.Add call, there is a window where the namespace is mounted in the system but missing from mountCache. Because NodeStage/NodeUnstage are only locked per-volume (volumeLocks), a concurrent unstage of a different namespace in the same subsystem could take a snapshot that omits this mount and incorrectly disconnect shared controllers. To avoid false negatives, consider updating mountCache immediately after the mount succeeds (using the best-available device identity) or adding a safety validation (e.g., re-check mounts via findmnt) before disconnect decisions.

Suggested change

transaction.isMounted = true

// Resolve real device and update cache

realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)

if err != nil {

log.WarningLog(ctx, "failed to resolve device: %v", err)

// Fallback - try to continue

realDevice = devicePath

}

ns.mountCache.Add(realDevice, stagingTargetPath)

// Immediately record the mount in the cache using the best-available device identity

ns.mountCache.Add(devicePath, stagingTargetPath)

// Resolve real device and refine cache entry if possible

realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)

if err != nil {

log.WarningLog(ctx, "failed to resolve device: %v", err)

// Fallback - keep using devicePath for cache

} else if realDevice != "" && realDevice != devicePath {

// Update cache to use the canonical device identifier for this mountpoint

ns.mountCache.RemoveByMountPoint(stagingTargetPath)

ns.mountCache.Add(realDevice, stagingTargetPath)

}

transaction.isMounted = true

Because NodeStage/NodeUnstage are only locked per-volume (volumeLocks), a concurrent unstage of a different namespace in the same subsystem could take a snapshot that omits this mount and incorrectly disconnect shared controllers

As I said in the PR description, in the next PR I will add group lock(which locks per group type stage\unstage) . Once this PR will be merged: #6183

internal/nvmeof/nodeserver/nodeserver.go

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

internal/nvmeof/nodeserver/nodeserver.go

internal/nvmeof/util/mounter.go

internal/nvmeof/nodeserver/nodeserver.go

nixpanic · 2026-03-27T11:29:53Z

internal/nvmeof/util/mounter_cache.go

+// MountCache is a thread-safe cache that maintains the mapping between staging paths and devices.
+// Note: This map is 1:1, meaning each staging path corresponds to exactly one device.
+// device can only be mounted to one staging path at a time, and each staging path can only have one device.
+type MountCache struct {


Can you provide a MountCache interface and mountCache struct that contains the actual implementation?

internal/nvmeof/util/mounter_cache.go

nixpanic · 2026-03-27T11:46:10Z

internal/nvmeof/nodeserver/nodeserver.go

 	}

+	// Initialize mounted devices cache on startup to ensure we have an accurate view
+	mountedDevices, err := getNVMeMountedDevices(context.Background())


Make getNVMeMountedDevices() a function of NodeServer (call it initNVMeMountedDevices()?) and add the path/device directly in that function. This removes the for-loop a little below and makes this function a little smaller.

nixpanic · 2026-03-30T11:49:10Z

internal/nvmeof/nodeserver/nodeserver.go

+	// so we should try to disconnect if this was the last mount
 	if transaction.devicePath != "" {
-		mountedDevices, err := getNVMeMountedDevices(ctx)
+		mountedDevices := ns.mountCache.GetCopyAllDevices()


this now does not return err anymore, the check for that on the next line does not make sense.

right, my bad

Add MountCache to track nvme device mounts in memory, eliminating repeated `findmnt` system calls during unstage operations. The cache maintains a bidirectional map between staging paths and device paths, allowing O(1) lookups in both directions. also the get device from staging path call is saved from `findmnt` sys call. Structure: - pathToDevice: staging path -> device (/staging/vol-123 → /dev/nvme0n1) - deviceToPath: device -> staging path (reverse lookup) - Protected by sync.Mutex for thread-safe concurrent access The cache will be initialized at node server startup by scanning existing mounts, then updated on each stage/unstage operation. This improves performance by replacing expensive findmnt --list calls (scanning all mounts) with fast in-memory map lookups. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

change the map from (staging path -> true\false) map to more informative map- staging path -> device. this function will be in used at node server init. scan the current stage of mounted nvme device. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

adding `mountCache`, which saves the findmnt calls. aslo change in the code any place we call to `findmnt` to call to cache get. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Copilot · 2026-03-30T12:45:01Z

internal/nvmeof/nodeserver/nodeserver.go

+	// Initialize mounted devices cache on startup to ensure we have an accurate view
+	if err := ns.initNVMeMountedDevices(context.Background()); err != nil {
+		log.WarningLog(context.Background(),
+			"failed to initialize mounted devices cache on startup: %v (mounted devices cache will be empty)", err)


NewNodeServer continues when initNVMeMountedDevices() fails, leaving mountCache empty. In that state, NodeUnstageVolume will make disconnect decisions using an incomplete snapshot and may disconnect shared controllers while other NVMe namespaces remain mounted (especially after node-plugin restart with pre-existing mounts). Consider either failing NodeServer startup on init error, or tracking an "initialized" flag and falling back to an authoritative findmnt scan for disconnect decisions until initialization succeeds (or retry initialization).

Suggested change

// Initialize mounted devices cache on startup to ensure we have an accurate view

if err := ns.initNVMeMountedDevices(context.Background()); err != nil {

log.WarningLog(context.Background(),

"failed to initialize mounted devices cache on startup: %v (mounted devices cache will be empty)", err)

// Initialize mounted devices cache on startup to ensure we have an accurate view.

// Failing to initialize this cache can lead to incorrect disconnect decisions, so

// treat initialization failures as fatal and fail NodeServer construction.

if err := ns.initNVMeMountedDevices(context.Background()); err != nil {

return nil, fmt.Errorf("failed to initialize mounted devices cache on startup: %w", err)

gadididi requested a review from nixpanic March 22, 2026 15:03

gadididi self-assigned this Mar 22, 2026

gadididi added the component/nvme-of Issues and PRs related to NVMe-oF. label Mar 22, 2026

gadididi requested review from Rakshith-R and Copilot March 23, 2026 06:14

Copilot started reviewing on behalf of gadididi March 23, 2026 07:30 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 8673fba to 077fce8 Compare March 24, 2026 14:22

gadididi requested a review from Copilot March 24, 2026 14:24

Copilot started reviewing on behalf of gadididi March 24, 2026 14:25 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

internal/nvmeof/nodeserver/nodeserver.go Show resolved Hide resolved

internal/nvmeof/util/mounter.go Outdated Show resolved Hide resolved

internal/nvmeof/nodeserver/nodeserver.go Outdated Show resolved Hide resolved

gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch from 077fce8 to 9558efb Compare March 25, 2026 07:46

nixpanic reviewed Mar 27, 2026

View reviewed changes

gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 5e46539 to 6d03f95 Compare March 30, 2026 09:31

gadididi requested a review from nixpanic March 30, 2026 09:31

gadididi mentioned this pull request Mar 30, 2026

nvmeof: Add GroupLock to coordinate stage and unstage operations #6210

Draft

6 tasks

nixpanic reviewed Mar 30, 2026

View reviewed changes

gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 53bba11 to 2fe4143 Compare March 30, 2026 11:59

gadididi requested a review from nixpanic March 30, 2026 12:23

gadididi added 3 commits March 30, 2026 15:23

nvmeof: add mount cache into nodeserver

0803810

adding `mountCache`, which saves the findmnt calls. aslo change in the code any place we call to `findmnt` to call to cache get. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch from 2fe4143 to 0803810 Compare March 30, 2026 12:23

gadididi requested a review from Copilot March 30, 2026 12:40

Copilot started reviewing on behalf of gadididi March 30, 2026 12:40 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Conversation

gadididi commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe what this PR does

Related PR##

Future concerns

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gadididi commented Mar 22, 2026 •

edited

Loading