[Stale] Compare identical environments by using a same_file::Handle#179
[Stale] Compare identical environments by using a same_file::Handle#179darnuria wants to merge 11 commits intomeilisearch:mainfrom
same_file::Handle#179Conversation
|
Thank you @darnuria, It looks like the |
0fba17a to
993fe55
Compare
same_file::Handle
| /// Mind that Handle currently open the file, so avoid writing through the fd held by [same_file::Handle], | ||
| /// since the file will also be opened by LMDB. | ||
| static OPENED_ENV: Lazy<RwLock<HashMap<Handle, EnvEntry>>> = Lazy::new(RwLock::default); |
There was a problem hiding this comment.
Leaved a mark since Handle open the file and hold a file descriptor https://github.com/BurntSushi/same-file/blob/master/src/unix.rs#L10
There was a problem hiding this comment.
Ok, so can we add a comment about it? I mean a really documentation comment in the lib.rs file. Under an Important/Remark header, please? I don't know how LMDB behaves towards that.
I am not sure that we really care about this sentence as we keep it open the whole env-lifetime right?
Do not have open an LMDB database twice in the same process at the same time. Not even from a plain open() call - close()ing it breaks flock() advisory locking.
There was a problem hiding this comment.
(backported in description in case this PR got stuck/closed to help somebody)
Yeah my worries comes from that line, and my first idea to just check (dev_t, ino_t)
For sure it may break the flock()ing (not the best system api..), and the multi-process features of lmdb is really handy.
Checked-out lmdb sources:
- https://github.com/LMDB/lmdb/blob/3947014aed7ffe39a79991fa7fb5b234da47ad1a/libraries/liblmdb/mdb.c#L4786-L4799
- https://github.com/LMDB/lmdb/blob/3947014aed7ffe39a79991fa7fb5b234da47ad1a/libraries/liblmdb/mdb.c#L2998-L3010
- https://github.com/LMDB/lmdb/blob/3947014aed7ffe39a79991fa7fb5b234da47ad1a/libraries/liblmdb/mdb.c#L5146-L5156
- https://github.com/LMDB/lmdb/blob/3947014aed7ffe39a79991fa7fb5b234da47ad1a/libraries/liblmdb/lmdb.h#L102-L105
About flocking in lmdb.h the sentense you pointed changed with:
commit 77845345ca9bf3854fd9da60a3e3b0527fa9c76a
Author: Hallvard Furuseth <hallvard@openldap.org>
Date: Tue Sep 27 07:03:42 2016 +0200
ITS#8505 Clarify fork() caveat, mdb_env_get_fd(), flock->fcntl.
diff --git a/libraries/liblmdb/lmdb.h b/libraries/liblmdb/lmdb.h
index 30d5862..319fcf6 100644
--- a/libraries/liblmdb/lmdb.h
+++ b/libraries/liblmdb/lmdb.h
@@ -97,11 +97,12 @@
* transactions. Each transaction belongs to one thread. See below.
* The #MDB_NOTLS flag changes this for read-only transactions.
*
- * - Use an MDB_env* in the process which opened it, without fork()ing.
+ * - Use an MDB_env* in the process which opened it, not after fork().
*
* - Do not have open an LMDB database twice in the same process at
* the same time. Not even from a plain open() call - close()ing it
- * breaks flock() advisory locking.
+ * breaks fcntl() advisory locking. (It is OK to reopen it after
+ * fork() - exec*(), since the lockfile has FD_CLOEXEC set.)
*
* - Avoid long-lived transactions. Read transactions prevent
* reuse of pages freed by newer write transactions, thus theThere was a problem hiding this comment.
So, what do we conclude? Can we still use the same_file::Handle struct? If not, what do we plan? Is it even possible to get the dev_t and ino_t without opening the file?
There was a problem hiding this comment.
stat function family (except the fstatat) check metadata and return and don't maintain a file descriptor.
It's the kind of function used by metadata in rust std, sadly posix never standardized a "device / FSID" so ino_t/dev_t is left to implementation detail.
For windows it's more https://docs.rs/winapi-util/latest/winapi_util/file/fn.information.html
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfileinformationbyhandle
|
I read somewhere that hard links on Windows require Administrator privileges... |
|
(not directly related to hardlink) Yeah windows CI had troble with theses two: https://github.com/meilisearch/heed/actions/runs/5508749200/jobs/10041079210?pr=179#step:4:140 |
|
Do you know if we can run a certain pipeline in the CI as admin or Windows admin? |
Samefile abstract away differences between Unixes and windows to check if two files lead to the same file on the underlying filesystem. Now it should follow through soft-links, hard-links and file moved.
Precise that Handle open the file.
129ba1d to
f840921
Compare
f840921 to
641d229
Compare
Don't know, also discovering that windows way of doing is really complicated. 🤡, like moving dir or getting file information. |
| assert_eq!(env_name, env.path()); | ||
|
|
||
| let env_renamed = dir.path().join("serafina.mdb"); | ||
| std::fs::rename(&env_name, &env_renamed).unwrap(); |
There was a problem hiding this comment.
for the windows part maybe the little UNIX api? https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/rename-wrename?view=msvc-170
I would like to skip the tests that require Admin rights on Windows as I don't have time to make them work in the CI for now 😬 What do you think? |
same_file::Handlesame_file::Handle
This PR implements this suggestion #145 (comment) and try to fix #180.
Stalled because the current implementation with
same_fileopen files since windows needs it, but lmdb itself use posix locks withfnctlsyscall on unixes.We use the
same_file::Handlestruct that comparesdev_tandino_tinstead of raw paths internally to make the comparison more accurate. On Windows, you should read how it works 😅.EDITS: Edited to add the caveat found during implementing the idea.
Caveat of this PR
Tl;DR Keeping FileDescriptor open, may trouble
flock/fcntlworks inside lmdb and screw multi-process lock with lmdb.From comment: #179 (comment)
Unix ideal: just check
(dev_t, ino_t)Unix way
statkind of function clean nice no File Descriptor keeped.Note: that same file keep open a File Descriptor open!
https://github.com/BurntSushi/same-file/blob/master/src/unix.rs#L10
Windows
An other
same-filerely on winapi32, involving Filehandles.Lmdb limitation
For sure it may break the
flock()ing (not the best system api..), and the multi-process features of lmdb is really handy.Checked-out lmdb sources:
About
flocking in lmdb.h the sentense you pointed changed with:commit 77845345ca9bf3854fd9da60a3e3b0527fa9c76a Author: Hallvard Furuseth <hallvard@openldap.org> Date: Tue Sep 27 07:03:42 2016 +0200 ITS#8505 Clarify fork() caveat, mdb_env_get_fd(), flock->fcntl. diff --git a/libraries/liblmdb/lmdb.h b/libraries/liblmdb/lmdb.h index 30d5862..319fcf6 100644 --- a/libraries/liblmdb/lmdb.h +++ b/libraries/liblmdb/lmdb.h @@ -97,11 +97,12 @@ * transactions. Each transaction belongs to one thread. See below. * The #MDB_NOTLS flag changes this for read-only transactions. * - * - Use an MDB_env* in the process which opened it, without fork()ing. + * - Use an MDB_env* in the process which opened it, not after fork(). * * - Do not have open an LMDB database twice in the same process at * the same time. Not even from a plain open() call - close()ing it - * breaks flock() advisory locking. + * breaks fcntl() advisory locking. (It is OK to reopen it after + * fork() - exec*(), since the lockfile has FD_CLOEXEC set.) * * - Avoid long-lived transactions. Read transactions prevent * reuse of pages freed by newer write transactions, thus the