Speedup available() for minio backends#555
Draft
hagenw wants to merge 1 commit into
Draft
Conversation
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideOptimize audb.available() for S3/MinIO and Artifactory backends by deriving available database versions from listed folders instead of probing for header file existence, significantly reducing calls and latency on remote storage. Sequence diagram for the optimized audb.available() on MinIO backendsequenceDiagram
actor User
participant AudbAPI
participant BackendInterface
participant Repository
participant MinioClient
User->>AudbAPI: call available()
AudbAPI->>BackendInterface: get_backend(repository)
BackendInterface-->>AudbAPI: backend
AudbAPI->>BackendInterface: is_s3_or_minio_backend()
BackendInterface-->>AudbAPI: True
AudbAPI->>MinioClient: list_objects(repository.name)
loop for each obj in top_level_objects
AudbAPI->>AudbAPI: name = obj.object_name without trailing slash
AudbAPI->>MinioClient: list_objects(repository.name, obj.object_name)
MinioClient-->>AudbAPI: sub_folders
loop for each sub_folder in sub_folders
AudbAPI->>AudbAPI: version = sub_folder.object_name.split("/")[1]
AudbAPI->>AudbAPI: check version not in [attachment, media, meta]
alt version is valid database version
AudbAPI->>AudbAPI: add_database(name, version, repository)
else version is attachment, media, or meta
AudbAPI-->>AudbAPI: skip
end
end
end
AudbAPI-->>User: return list of available databases and versions
Flow diagram for backend-specific version discovery in audb.available()flowchart TD
A["Start audb.available()"] --> B["Get backend for repository"]
B --> C{"Backend is S3 or MinIO?"}
C -- "Yes" --> D["Call client.list_objects(repository.name)
(top level objects)"]
D --> E["For each obj: derive name from obj.object_name"]
E --> F["Call client.list_objects(repository.name, obj.object_name)
(sub_folders)"]
F --> G["For each sub_folder: version = sub_folder.object_name.split('/')[1]"]
G --> H{"version in [attachment, media, meta]?"}
H -- "Yes" --> I["Skip version"]
H -- "No" --> J["add_database(name, version, repository)"]
I --> K["Next sub_folder/obj"]
J --> K
K --> L["All objects processed"]
C -- "No" --> M["backend_interface.ls('/')"]
M --> N["For each (path, version)"]
N --> O{"path endswith HEADER_FILE?"}
O -- "Yes" --> P["add_database(name from path, version, repository)"]
O -- "No" --> Q["Skip entry"]
P --> R["Next entry"]
Q --> R
R --> L
L --> S["Return list of available databases and versions"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Member
Author
|
I checked, for the repository |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Speedup
audb.available()for s3/minio backends by not checking if the header file exists. For artifactory backends we also base the list of available versions on the available folders and don't check for the existence of the header file.This approach is more risky, but it brings a speedup.
Execution time.
Summary by Sourcery
Optimize database discovery for object-storage backends when listing available databases and versions.
Enhancements: