-
Notifications
You must be signed in to change notification settings - Fork 9
feat(test): Add MariaDB docker tasks; Add storage-related tests in GitHub workflows. #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughCI and local test flows now support environment-driven MariaDB: added start/stop container scripts and a programmatic DB initializer, tests and helpers read SPIDER_STORAGE_URL (fallbacks preserved), taskfiles/CI wired to run tests via a mariadb-storage executor, lint scope extended, and the old SQL file removed. Changes
Sequence Diagram(s)sequenceDiagram
participant TaskRunner as Task (taskfiles/test.yaml)
participant Executor as mariadb-storage-executor
participant StartScript as tools/scripts/mariadb/start.py
participant Docker as Docker Engine
participant InitDB as tools/scripts/mariadb/wolf/init_db.py
participant Tests as Test Runner
participant StopScript as tools/scripts/mariadb/stop.py
TaskRunner->>Executor: invoke executor for TEST_TASK
Executor->>StartScript: start container (name, port, creds)
StartScript->>Docker: docker pull & run container
StartScript->>Docker: docker exec healthcheck (poll until ready)
StartScript-->>Executor: return host:port and success
Executor->>InitDB: run init_db.py with DB params
InitDB-->>Executor: return success after creating schema
Executor->>Tests: run TEST_TASK with SPIDER_STORAGE_URL env
Executor->>StopScript: request stop container on completion
StopScript->>Docker: docker stop/remove container
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @tools/scripts/storage/init_db.py:
- Around line 15-231: The CREATE TABLE statements incorrectly use ON UPDATE
CURRENT_TIMESTAMP for columns that should be immutable after creation; remove
the ON UPDATE CURRENT_TIMESTAMP clause from `creation_time` in the jobs table
(jobs -> `creation_time`), from `start_time` in the task_instances table
(task_instances -> `start_time`), and from `lease_time` in the scheduler_leases
table (scheduler_leases -> `lease_time`) while keeping DEFAULT CURRENT_TIMESTAMP
so the timestamp is set at insertion; leave the `heartbeat` column (drivers ->
`heartbeat`) as-is with ON UPDATE CURRENT_TIMESTAMP.
In @tools/scripts/storage/start.py:
- Line 12: The _MARIADB_IMAGE constant is pinned to "mariadb:latest", which is
non-reproducible; change _MARIADB_IMAGE to a specific, supported MariaDB tag
(e.g., "mariadb:10.11" or another project-approved version) or make it
configurable (read from an environment variable or script argument with a stable
default) so builds are deterministic; update any documentation or defaults to
reflect the chosen pinned version.
In @tools/scripts/storage/stop.py:
- Line 25: The argparse parser description contains a typo: change the string
passed to the ArgumentParser constructor (the description argument used when
creating parser) from "Stop MairaDB Docker container." to "Stop MariaDB Docker
container." so the CLI help shows the correct database name; update the
description in the parser instantiation where parser =
argparse.ArgumentParser(...) is defined.
- Around line 44-48: The variable name localstack_stop_cmd is misleading because
it targets a MariaDB container; rename it to mariadb_stop_cmd wherever defined
and referenced (the list currently built with "docker", "stop", args.name) so
the variable name matches its purpose—update the declaration and any subsequent
use of localstack_stop_cmd to mariadb_stop_cmd in the stop.py script.
🧹 Nitpick comments (5)
.github/workflows/tests.yaml (1)
1-1: Consider updating the workflow name to reflect broader test scope.The workflow name "unit-tests" no longer accurately reflects its scope, as it now runs integration tests (line 60). Consider renaming to "tests" or "all-tests" for clarity.
📝 Suggested rename
-name: "unit-tests" +name: "tests"tools/scripts/storage/get_free_port.py (1)
24-25: Remove the unused noqa directive.The static analysis tool correctly identifies that the
noqa: T201directive is unnecessary because the T201 rule (print statement detection) is not enabled in your Ruff configuration.♻️ Proposed fix
- # ruff: noqa: T201 print(port)tools/scripts/storage/init_db.py (3)
97-97: Consider rationalizing VARBINARY and VARCHAR size limits.The value size limit of 999 bytes appears in multiple columns (
data.value,task_outputs.type,task_outputs.value,task_inputs.type,task_inputs.value,client_kv_data.value,task_kv_data.value). This specific limit seems arbitrary.Common choices are powers of 2 (1024, 2048) or clean decimal values (1000). Consider whether 999 was chosen for a specific reason or if it should be normalized.
Also applies to: 108-109, 123-123, 214-215, 224-224
271-284: Add error handling and success logging.The script lacks error handling and logging for the database initialization process. If table creation fails, the error won't be clearly communicated to the user.
♻️ Proposed improvement
+ logger.info("Connecting to MariaDB at 127.0.0.1:%d...", args.port) with ( mariadb.connect( host="127.0.0.1", port=args.port, user=args.username, password=args.password, database=args.database, ) as conn, conn.cursor() as cursor, ): + logger.info("Creating database tables...") for table_creator in _TABLE_CREATORS: cursor.execute(table_creator) conn.commit() + logger.info("Database initialization completed successfully.") return 0
273-273: Hardcoded host limits flexibility.The host is hardcoded to
"127.0.0.1". If MariaDB needs to run on a different host (e.g., Docker host networking scenarios), this won't be configurable.Consider adding a
--hostargument with"127.0.0.1"as the default, similar to the other connection parameters.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
.github/workflows/tests.yamlpython/spider-py/tests/client/test_driver.pypython/spider-py/tests/storage/test_mariadb.pytaskfiles/lint.yamltaskfiles/test.yamltests/integration/client.pytests/integration/test_client.pytests/integration/test_scheduler_worker.pytests/integration/test_signal.pytests/storage/StorageTestHelper.hpptools/scripts/storage/get_free_port.pytools/scripts/storage/init_db.pytools/scripts/storage/init_db.sqltools/scripts/storage/start.pytools/scripts/storage/stop.pytools/scripts/storage/wait_for_db.py
💤 Files with no reviewable changes (1)
- tools/scripts/storage/init_db.sql
🧰 Additional context used
🧬 Code graph analysis (5)
tools/scripts/storage/start.py (2)
tools/scripts/storage/init_db.py (1)
main(242-285)tools/scripts/storage/wait_for_db.py (1)
main(21-65)
tools/scripts/storage/wait_for_db.py (2)
tools/scripts/storage/init_db.py (1)
main(242-285)tools/scripts/storage/start.py (1)
main(22-114)
python/spider-py/tests/client/test_driver.py (1)
python/spider-py/src/spider_py/client/driver.py (1)
Driver(19-60)
python/spider-py/tests/storage/test_mariadb.py (1)
python/spider-py/src/spider_py/storage/jdbc_url.py (1)
parse_jdbc_url(22-55)
tests/integration/test_client.py (1)
tests/integration/client.py (2)
get_storage_url(116-121)storage(125-134)
🪛 Ruff (0.14.10)
tools/scripts/storage/start.py
61-61: subprocess call: check for execution of untrusted input
(S603)
73-73: subprocess call: check for execution of untrusted input
(S603)
104-104: subprocess call: check for execution of untrusted input
(S603)
tools/scripts/storage/wait_for_db.py
45-45: subprocess call: check for execution of untrusted input
(S603)
tools/scripts/storage/stop.py
34-34: subprocess call: check for execution of untrusted input
(S603)
50-50: subprocess call: check for execution of untrusted input
(S603)
tools/scripts/storage/get_free_port.py
24-24: Unused noqa directive (non-enabled: T201)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
- GitHub Check: lint
🔇 Additional comments (27)
.github/workflows/tests.yaml (2)
20-20: LGTM!The job rename from
non-storage-unit-teststotestsappropriately reflects the expanded test scope that now includes storage-related tests.
58-62: No changes required. The test tasks (test:cpp-unit-tests,test:cpp-integration-tests,test:spider-py-unit-tests) are self-contained and properly handle the full MariaDB lifecycle:
- Storage setup: Each task calls the
start-storagesubtask, which starts the MariaDB container, waits for readiness, and initialises the database. TheSPIDER_STORAGE_URLenvironment variable is set correctly.- Cleanup on failure: The
defer:keyword guarantees cleanup runs even if tests fail, preventing resource leaks.- CI compatibility: Dynamic port allocation via
get_free_port.pyand unique container naming viauuidgenensure portability across environments including GitHub Actions.python/spider-py/tests/client/test_driver.py (1)
16-17: LGTM!The environment-driven storage URL configuration is correctly implemented with an appropriate fallback to the default test URL.
python/spider-py/tests/storage/test_mariadb.py (1)
19-20: LGTM!The fixture correctly retrieves the storage URL from the environment variable and parses it for MariaDB connection parameters.
tools/scripts/storage/wait_for_db.py (1)
21-66: LGTM!The wait logic correctly polls the MariaDB container health status with appropriate timeout handling and sleep intervals between attempts.
Note: The static analysis warning (S603) about untrusted subprocess input is a false positive in this context, as both
docker_executableandargs.nameare controlled inputs.tests/storage/StorageTestHelper.hpp (3)
17-17: Good refactoring: constexpr string_view for constant URL.Changing from
std::string consttoconstexpr std::string_viewis more efficient and appropriate for a compile-time constant.
24-32: LGTM!The
get_storage_url()function correctly reads theSPIDER_STORAGE_URLenvironment variable and falls back to the default value when not set.
36-37: LGTM!The factory creation now uses the environment-driven storage URL, aligning with the broader refactoring to support dynamic storage configuration.
taskfiles/lint.yaml (1)
139-146: LGTM!Extending linting to cover the newly introduced storage scripts is appropriate and maintains consistency with the existing linting configuration.
tests/integration/test_signal.py (2)
15-27: LGTM!The import changes correctly align with the refactored public API in
tests/integration/client.py, switching from direct constant access to the newget_storage_url()function.
87-87: LGTM!The fixture correctly uses
get_storage_url()to obtain the runtime-configured storage URL instead of the previous hardcoded constant.tests/integration/test_scheduler_worker.py (2)
13-30: LGTM!The import changes correctly reflect the refactored API, bringing in
get_storage_url()andstoragefrom the updatedintegration.clientmodule.
93-93: LGTM!The fixture correctly uses
get_storage_url()for runtime storage URL configuration, consistent with the environment-driven approach introduced in this PR.tools/scripts/storage/get_free_port.py (1)
11-18: Be aware of the inherent TOCTOU race condition.The approach of binding to port 0 to obtain a free port has a time-of-check-time-of-use (TOCTOU) race condition: between the time this script returns the port number and when the MariaDB container actually binds to it, another process could claim the port. This is an inherent limitation of this port allocation pattern and is generally acceptable for test scenarios where the window is small.
If port conflicts occur in CI/CD environments, consider implementing retry logic in the MariaDB startup script or using Docker's dynamic port allocation features instead.
tests/integration/client.py (3)
114-114: LGTM!Renaming to uppercase
G_STORAGE_URLfollows Python naming conventions for module-level constants. The hardcoded credentials in the default URL are acceptable for test scenarios.
116-121: LGTM!The new
get_storage_url()function provides a clean abstraction for environment-driven storage URL configuration, falling back to the default whenSPIDER_STORAGE_URLis not set.
132-132: LGTM!The fixture correctly uses
get_storage_url()to support runtime storage configuration via the environment variable.tools/scripts/storage/init_db.py (1)
1-14: LGTM! Clean script setup.The shebang, dependency declaration, and imports are well-structured. Using
uv run --scriptwith inline dependencies is an excellent choice for this utility script.tools/scripts/storage/start.py (3)
1-20: LGTM! Clean script structure.The imports, logging configuration, and image constant are well-organized.
61-69: Clarify the return code when container already exists.Returning
1when the container already exists (line 69) typically indicates an error, but this is more of a no-op situation. Consider whether this should return0for success or use a different approach.The current behavior might be intentional if calling scripts need to distinguish between "started fresh" and "already running". Please verify the expected behavior in the task orchestration context.
57-66: Static analysis warnings are false positives.The Ruff S603 warnings about untrusted input in
subprocesscalls are false positives. Thedocker_executableis a hardcoded string (line 59), and all other arguments are either literals or validated by argparse.Also applies to: 73-82, 104-114
tests/integration/test_client.py (1)
10-10: LGTM! Proper migration to environment-driven storage URL.The changes correctly replace the hardcoded
g_storage_urlglobal with theget_storage_url()accessor function. This enables runtime configuration via theSPIDER_STORAGE_URLenvironment variable, which aligns with the MariaDB Docker lifecycle management introduced in this PR.The
storagefixture import on line 10 provides proper lifecycle management through pytest's fixture dependency mechanism.Also applies to: 66-66, 87-87
taskfiles/test.yaml (5)
7-9: LGTM! Clean configuration variables.The MariaDB configuration is properly centralized in global variables, making it easy to adjust defaults across all test tasks.
24-26: Excellent cross-platform UUID normalization.The UUID generation and normalization logic handles the case difference between macOS (uppercase) and Linux (lowercase), ensuring consistent container naming across platforms. The
spider-mariadb-prefix provides clear identification.Also applies to: 48-50, 89-91, 145-147, 178-180
27-28: Be aware of potential port allocation race condition.Dynamic port allocation via
get_free_port.pyis a good approach, but there's a time-of-check-to-time-of-use (TOCTOU) race condition: between obtaining the free port and starting the container, another process could bind to that port.This is unlikely in typical CI environments but could occur in heavily loaded systems or when running multiple test suites concurrently.
Consider whether the test infrastructure needs additional retry logic or port conflict handling. In practice, this may be acceptable given the short time window and typical usage patterns.
Also applies to: 51-52, 92-93, 148-149, 181-182
196-220: LGTM! Well-structured storage lifecycle management.The
start-storageinternal task properly orchestrates the three-step process:
- Start the MariaDB container
- Wait for readiness
- Initialize the database schema
The sequential execution ensures each step completes before the next begins, and the
internal: trueflag appropriately restricts this task to internal use only.
37-39: No action needed. Thestop.pyscript already handles non-existent containers gracefully by checking container existence withdocker inspect(line 35) and exiting with code 0 while logging a warning message if the container doesn't exist or isn't running (lines 40-42). Thedeferblocks will safely execute even ifstart-storagefails, as the stop script is designed to handle this scenario without producing confusing error output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docs/src/dev-docs/testing.md:
- Around line 26-27: The sentence referencing the GitHub workflow
"[tests.yaml][gh-workflow-tests]" contains a grammatical error—remove the
redundant "the" so it reads "runs all unit tests and integration tests"; update
the sentence in the docs text where the phrase "runs the all unit tests and
integration tests" appears to "runs all unit tests and integration tests" to
correct the grammar.
🧹 Nitpick comments (4)
tests/integration/test_signal.py (1)
21-21: Remove unnecessarynoqadirective.The
# noqa: F401directive is unnecessary here. Ruff understands thatstorageis a pytest fixture and won't flag it as an unused import.♻️ Proposed fix
- storage, # noqa: F401 + storage,tests/integration/test_scheduler_worker.py (1)
24-24: Remove unnecessarynoqadirective.The
# noqa: F401directive is unnecessary here. Ruff understands thatstorageis a pytest fixture and won't flag it as an unused import.♻️ Proposed fix
- storage, # noqa: F401 + storage,tests/integration/test_client.py (1)
10-10: Remove unnecessarynoqadirective.The
# noqa: F401directive is unnecessary here. Ruff understands thatstorageis a pytest fixture and won't flag it as an unused import.♻️ Proposed fix
-from integration.client import get_storage_url, SQLConnection, storage # noqa: F401 +from integration.client import get_storage_url, SQLConnection, storagetools/scripts/storage/init_db.py (1)
242-286: Consider adding explicit error handling for better diagnostics.The connection and execution logic is correct and uses proper context management. However, there's no explicit error handling—any connection failure or SQL execution error will raise an unhandled exception. While fail-fast behaviour may be intentional for a setup script, wrapping the connection block in a try/except with structured logging would improve diagnostics.
♻️ Optional: Add error handling
def main() -> int: """Main.""" parser = argparse.ArgumentParser(description="Initialize the database tables for Spider.") # ... argument parsing ... args = parser.parse_args() try: with ( mariadb.connect( host="127.0.0.1", port=args.port, user=args.username, password=args.password, database=args.database, ) as conn, conn.cursor() as cursor, ): for table_creator in _TABLE_CREATORS: cursor.execute(table_creator) conn.commit() logger.info("Database tables initialized successfully.") return 0 except mariadb.Error as e: logger.error("Failed to initialize database: %s", e) return 1
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.github/workflows/tests.yamldocs/src/dev-docs/testing.mdpython/spider-py/README.mdtests/integration/client.pytests/integration/test_client.pytests/integration/test_scheduler_worker.pytests/integration/test_signal.pytools/scripts/storage/init_db.pytools/scripts/storage/stop.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/integration/client.py
- .github/workflows/tests.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-09T17:15:24.552Z
Learnt from: davidlion
Repo: y-scope/spider PR: 100
File: src/spider/worker/worker.cpp:205-230
Timestamp: 2025-04-09T17:15:24.552Z
Learning: Documentation should be added to new functions in the spider codebase, as already discussed with the user.
Applied to files:
python/spider-py/README.md
🧬 Code graph analysis (4)
tools/scripts/storage/init_db.py (2)
tools/scripts/storage/start.py (1)
main(22-114)tools/scripts/storage/wait_for_db.py (1)
main(21-65)
tests/integration/test_client.py (1)
tests/integration/client.py (2)
get_storage_url(118-123)storage(127-136)
tests/integration/test_scheduler_worker.py (1)
tests/integration/client.py (2)
get_storage_url(118-123)storage(127-136)
tests/integration/test_signal.py (1)
tests/integration/client.py (5)
get_storage_url(118-123)get_task_outputs(198-223)get_task_state(226-240)remove_job(243-253)storage(127-136)
🪛 Ruff (0.14.10)
tests/integration/test_client.py
10-10: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
tests/integration/test_scheduler_worker.py
24-24: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
tests/integration/test_signal.py
21-21: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
tools/scripts/storage/stop.py
34-34: subprocess call: check for execution of untrusted input
(S603)
50-50: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint
- GitHub Check: tests (ubuntu-24.04)
- GitHub Check: tests (ubuntu-22.04)
🔇 Additional comments (10)
python/spider-py/README.md (1)
27-33: Verify that the simplified Testing section adequately documents the new storage-enabled test workflow.The Testing section has been significantly simplified to a single task command. However, the PR introduces a new
SPIDER_STORAGE_URLenvironment variable and MariaDB-backed storage integration. The README no longer documents:
- How developers should configure or set up storage for local testing (e.g., whether storage-creation task runs automatically or requires manual invocation)
- The role of the
SPIDER_STORAGE_URLenvironment variable in the test workflow- Whether developers need specific prerequisites (e.g., Docker, MariaDB) installed locally
- How the unified
test:spider-py-unit-teststask interacts with the storage setup introduced in this PRConsider whether the README should include additional context about storage configuration for developers setting up tests locally, or link to more detailed testing documentation if it exists elsewhere.
tests/integration/test_signal.py (2)
76-76: LGTM! Environment-driven storage configuration.The fixture correctly uses
get_storage_url()to retrieve the storage URL from the environment variable, enabling flexible test configuration.Also applies to: 87-89
103-103: LGTM! Consistent storage fixture usage.The test methods correctly use the
storagefixture with appropriatenoqaannotations for parameter redefinition.Also applies to: 180-180
tests/integration/test_scheduler_worker.py (2)
82-82: LGTM! Environment-driven storage configuration.The fixture correctly uses
get_storage_url()to retrieve the storage URL from the environment variable, enabling flexible test configuration.Also applies to: 93-93
284-284: LGTM! Consistent storage fixture usage.The test methods correctly use the
storagefixture with appropriatenoqaannotations for parameter redefinition.Also applies to: 314-314, 331-331, 351-351
tests/integration/test_client.py (2)
55-55: LGTM! Environment-driven storage configuration.The fixture correctly uses
get_storage_url()to retrieve the storage URL from the environment variable, enabling flexible test configuration.Also applies to: 66-66
87-87: LGTM! Consistent storage URL usage.The test correctly passes the environment-driven storage URL to the
client_testprogram.tools/scripts/storage/stop.py (2)
44-55: Stop logic is correct.The container stop logic properly handles errors and logs appropriate messages. Past review comments about the variable name have been addressed (
mariadb_stop_cmdis correctly used).
20-42: Use thedocker_executablevariable consistently.The
docker_executablevariable is defined on line 23 with a comment stating it's to silence Ruff S607, but line 35 uses the hardcoded string"docker"instead of the variable, making the variable and comment misleading.🔧 Proposed fix
result = subprocess.run( - [docker_executable, "inspect", "-f", "{{.State.Running}}", args.name], + [docker_executable, "inspect", "-f", "{{.State.Running}}", args.name], capture_output=True, text=True, check=False, )Note: Line 35 already shows
docker_executablein the diff above, but verify line 45 also usesdocker_executableinstead of"docker".Likely an incorrect or invalid review comment.
tools/scripts/storage/init_db.py (1)
15-231: Schema looks solid; past timestamp issues resolved.The SQL schema is comprehensive and correctly configured. Past review comments regarding incorrect
ON UPDATE CURRENT_TIMESTAMPusage oncreation_time,start_time, andlease_timehave been addressed—these columns now correctly use onlyDEFAULT CURRENT_TIMESTAMP. Theheartbeatcolumn appropriately retainsON UPDATE CURRENT_TIMESTAMPfor auto-updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
taskfiles/test.yaml (3)
7-9: Consider documenting that these are test-only credentials.While the hardcoded credentials are acceptable for local testing, consider adding a comment clarifying that these are for development/testing purposes only and should not be used in production environments.
111-113: Simplify pytest invocation.Line 112 activates the virtual environment, but line 113 uses the full path to pytest, making the activation redundant.
♻️ Simplify to use pytest directly
- - |- - . ../test-venv/bin/activate - ../test-venv/bin/pytest tests/integration + - "../test-venv/bin/pytest tests/integration"
18-204: Consider reducing code duplication across storage-dependent tasks.The MariaDB setup pattern (container name generation, port allocation, storage URL construction, start-storage invocation, and defer cleanup) is duplicated across five tasks. This creates maintenance burden—changes to the storage lifecycle require updates in multiple places.
While YAML anchors have limitations with Taskfile's variable interpolation, consider whether a wrapper task or shared configuration could reduce this duplication in a future refactor.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
taskfiles/test.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (3)
taskfiles/test.yaml (3)
44-68: Clarify the test task strategy.The presence of three tasks (cpp-unit-tests, cpp-storage-unit-tests, cpp-non-storage-unit-tests) creates ambiguity about which task developers should use. Consider documenting the intended workflow:
- Is cpp-unit-tests the primary entry point for running all tests?
- When should developers use the split versions?
206-230: Well-designed internal task for storage lifecycle.The start-storage task properly encapsulates the three-phase MariaDB setup (start → wait → init) with sensible defaults. Making it internal ensures it's only invoked by other tasks.
18-42: Well-structured storage test lifecycle.The task properly implements the MariaDB container lifecycle with dynamic port allocation, UUID-based container naming, and cleanup via defer. The cross-platform UUID normalization is a good defensive measure. All referenced scripts and variables are properly defined and in place.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the scripts. Can we restructure the tool scripts as the following:
tool/scripts/
get_free_port.py
mariadb/
start.py
stop.py
wolf/
init.py
huntsman/
init.py
get_free_port.pyseems to be a general helper- Using mariadb directly instead of storage, in case we need other storage backend in the future
- Move init scripts into the dedicated namespace
This should make the structure cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 25-31: The MARIADB_PORT var (and other similar vars using
tools/scripts/get_free_port.py) uses a relative path that breaks when tasks run
from non-root directories; update the sh value for MARIADB_PORT and the other
instances (the vars/commands at the other noted blocks) to invoke the script via
the repository root variable (i.e., prefix with {{.ROOT_DIR}}) so the path
resolves regardless of task dir:, ensuring you change every occurrence that
currently is "tools/scripts/get_free_port.py".
- Around line 21-25: The SPIDER_STORAGE_URL (and the same patterns at the other
occurrences) currently interpolates G_MARIADB_USERNAME and G_MARIADB_PASSWORD
directly into the JDBC URL which will break if credentials contain characters
like @ or &; update the template to percent-encode (URL-encode)
G_MARIADB_USERNAME and G_MARIADB_PASSWORD before embedding them into
SPIDER_STORAGE_URL (or, alternatively, move user and password out of the query
string and supply them as separate JDBC connection properties), and apply the
same fix to the other identical URL templates referenced (the blocks around
lines 47–51, 90–94, 149–152, 184–187) so all credential interpolations are
safely encoded or passed as properties.
In @tools/scripts/get_free_port.py:
- Around line 21-26: The inline Ruff suppression "# ruff: noqa: T201" in the
main function is unused and should be removed; edit the main() implementation
(which calls get_free_port() and prints the port) to delete that comment so Ruff
will report T201 as intended, or alternatively keep the comment but instead add
T201 to your project's Ruff config to explicitly allow it—pick one approach and
apply it consistently.
In @tools/scripts/mariadb/start.py:
- Around line 68-77: The code only treats a running container case but ignores
when docker inspect returns exit code 0 and stdout == "false" (a stopped
container with the same name), which will cause docker run --name ... to fail
with a name conflict; update the check in the block that runs subprocess.run
(using docker_executable and args.name and the result variable) to detect
result.returncode == 0 and result.stdout.rstrip("\n") == "false", then log a
clear error like "Container <name> exists but is stopped; remove or rename it
before running" and return a non-zero exit code (or raise/exit) instead of
proceeding; apply the same explicit stopped-container handling to the other
inspect/check sequence around the 91-121 region so both places consistently
detect and fail with a helpful message when a name conflict with a stopped
container exists.
In @tools/scripts/mariadb/stop.py:
- Around line 20-55: The code incorrectly hardcodes "docker" when building
mariadb_stop_cmd and emits a misleading warning; update mariadb_stop_cmd to use
the docker_executable variable instead of the literal "docker", and change the
logger.warning in main (the inspect check) to say the container "is not running
or does not exist" (or similar) since the inspect result covers both cases; keep
the existing return behavior and error handling for subprocess.run as-is.
In @tools/scripts/mariadb/wolf/init_db.py:
- Around line 225-269: The main() function currently lets mariadb.connect and
cursor.execute raise unstructured exceptions; add explicit try/except around the
connect/execute/commit block in main() to catch mariadb.Error (or Exception),
log a clear error message including exception details, and return a non-zero
exit code on failure; also add an optional --host argparse option (default
"127.0.0.1") and use args.host when calling mariadb.connect; refer to main(),
_TABLE_CREATORS, mariadb.connect, cursor.execute, conn.commit to locate the
changes.
🧹 Nitpick comments (3)
tools/scripts/mariadb/start.py (1)
68-73: Add subprocess timeouts to avoid hung CI jobs.
docker pull/docker execcan hang indefinitely; considertimeout=...onsubprocess.run(...)(and log the timeout distinctly).Also applies to: 80-85, 111-116, 124-136
taskfiles/test.yaml (2)
206-227: Consider using{{.ROOT_DIR}}for MariaDB scripts too (defensive against cwd changes).Even if Task runs internal tasks from a predictable cwd, using absolute-from-root paths here makes the lifecycle more robust.
Proposed diff
- tools/scripts/mariadb/start.py \ + {{.ROOT_DIR}}/tools/scripts/mariadb/start.py \ @@ - tools/scripts/mariadb/wolf/init_db.py \ + {{.ROOT_DIR}}/tools/scripts/mariadb/wolf/init_db.py \
18-43: Optional: reduce duplication by wrapping “with-mariadb” lifecycle once.The repeated blocks (env + vars + defer + start-mariadb) across test tasks are easy to drift; consider a single internal task that takes “test command” as a var, or a shared included Taskfile.
Also applies to: 44-68, 77-114, 144-170, 179-205
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
taskfiles/lint.yamltaskfiles/test.yamltools/scripts/get_free_port.pytools/scripts/mariadb/start.pytools/scripts/mariadb/stop.pytools/scripts/mariadb/wolf/init_db.py
🚧 Files skipped from review as they are similar to previous changes (1)
- taskfiles/lint.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tools/scripts/mariadb/stop.py (1)
tools/scripts/mariadb/start.py (1)
main(23-144)
tools/scripts/mariadb/start.py (3)
tools/scripts/get_free_port.py (1)
main(21-26)tools/scripts/mariadb/stop.py (1)
main(20-55)tools/scripts/mariadb/wolf/init_db.py (1)
main(225-268)
🪛 Ruff (0.14.10)
tools/scripts/get_free_port.py
24-24: Unused noqa directive (non-enabled: T201)
Remove unused noqa directive
(RUF100)
tools/scripts/mariadb/stop.py
34-34: subprocess call: check for execution of untrusted input
(S603)
50-50: subprocess call: check for execution of untrusted input
(S603)
tools/scripts/mariadb/start.py
68-68: subprocess call: check for execution of untrusted input
(S603)
80-80: subprocess call: check for execution of untrusted input
(S603)
111-111: subprocess call: check for execution of untrusted input
(S603)
124-124: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (3)
tools/scripts/get_free_port.py (1)
11-18: Be aware: “free port” selection is TOCTOU and can flake under concurrency.This can race (port is freed immediately after Line 18). If you see sporadic “address already in use” from docker, consider adding a retry strategy around container start (or move to docker-assigned random host ports and then query the mapped port).
tools/scripts/mariadb/stop.py (1)
34-39: Subprocess S603: looks fine (no shell), but confirm container name input is trusted in your usage.Since
args.nameis CLI-provided, confirm your CI/task usage doesn’t pass attacker-controlled values (even though this is a list-form call, not shell).Also applies to: 50-50
tools/scripts/mariadb/start.py (1)
68-68: Subprocess S603 warnings: likely acceptable (no shell), but confirm this is intended.If you want to silence these, add targeted
# noqa: S603on the specific calls (or document why this is safe in-repo).Also applies to: 80-80, 111-111, 124-124
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tools/scripts/mariadb/stop.py (2)
25-31: Consider clarifying the help text.The help text refers to "the name of the started MariaDB container", which is slightly awkward phrasing for a stop script. Consider simplifying to "the name of the MariaDB container".
♻️ Suggested refinement
parser.add_argument( "--name", type=str, default="mariadb-spider-dev", - help="The name of the started MariaDB container (default: %(default)s)", + help="The name of the MariaDB container to stop (default: %(default)s)", )
40-42: Improve the warning message accuracy.The warning message "Container '%s' doesn't exist" is misleading when
returncode == 0but the container is stopped (stdout != "true"). In that case, the container exists but is not running. Consider making the message more accurate to cover both scenarios.♻️ Suggested refinement
if result.returncode != 0 or result.stdout.rstrip("\n") != "true": - logger.warning("Container '%s' doesn't exist. Exit peacefully.", args.name) + logger.warning("Container '%s' is not running or doesn't exist. Nothing to stop.", args.name) return 0
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/scripts/mariadb/stop.py
🧰 Additional context used
🧬 Code graph analysis (1)
tools/scripts/mariadb/stop.py (2)
tools/scripts/mariadb/wolf/init_db.py (1)
main(225-268)tools/scripts/mariadb/start.py (1)
main(23-144)
🪛 Ruff (0.14.10)
tools/scripts/mariadb/stop.py
34-34: subprocess call: check for execution of untrusted input
(S603)
50-50: subprocess call: check for execution of untrusted input
(S603)
🔇 Additional comments (5)
tools/scripts/mariadb/stop.py (5)
1-5: LGTM!The shebang and script metadata follow the project conventions and are consistent with the companion
start.pyscript.
7-17: LGTM!The imports and logging configuration are appropriate for this script's functionality.
44-55: LGTM!The container stopping logic is well-implemented with proper error handling and clear logging.
58-59: LGTM!The entry point follows standard Python conventions.
34-55: Static analysis warnings are false positives.The Ruff S603 warnings about subprocess calls are false positives in this context. The
docker_executableis hardcoded (line 23), and commands are constructed as lists rather than shell strings. Theargs.nameparameter is expected user input for specifying which container to stop, which is the intended functionality of this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
taskfiles/test.yaml (2)
144-169: Potential path resolution issue withget_free_port.py.This task sets
dir: "{{.G_SRC_PYTHON_DIR}}". Ifsh:commands invarsexecute from that directory context, the relative pathtools/scripts/get_free_port.pyon line 157 would fail. Consider using an absolute path.Suggested fix using absolute path
vars: MARIADB_CONTAINER_NAME: # Normalize UUID casing: macOS generates uppercase while Linux generates lowercase. sh: "uuidgen | tr '[:upper:]' '[:lower:]' | sed 's/^/spider-mariadb-/'" MARIADB_PORT: - sh: "tools/scripts/get_free_port.py" + sh: "{{.ROOT_DIR}}/tools/scripts/get_free_port.py"
179-204: Same path resolution issue asspider-py-unit-tests.The relative path to
get_free_port.pyon line 192 should also be made absolute for consistency with the suggested fix above.Suggested fix
MARIADB_PORT: - sh: "tools/scripts/get_free_port.py" + sh: "{{.ROOT_DIR}}/tools/scripts/get_free_port.py"
🧹 Nitpick comments (1)
tools/scripts/mariadb/start.py (1)
13-13: Consider pinning the MariaDB image to a specific version.Using
mariadb:latestcan lead to non-reproducible test environments. If MariaDB releases an update with breaking changes or different default behaviour, tests could fail unexpectedly.Suggested change
-_MARIADB_IMAGE = "mariadb:latest" +_MARIADB_IMAGE = "mariadb:11.4"
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
taskfiles/test.yamltools/scripts/mariadb/start.py
🧰 Additional context used
🪛 Ruff (0.14.10)
tools/scripts/mariadb/start.py
68-68: subprocess call: check for execution of untrusted input
(S603)
80-80: subprocess call: check for execution of untrusted input
(S603)
111-111: subprocess call: check for execution of untrusted input
(S603)
124-124: subprocess call: check for execution of untrusted input
(S603)
153-153: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: lint
- GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (9)
tools/scripts/mariadb/start.py (4)
23-66: LGTM!The argument parsing is well-structured with sensible defaults and helpful descriptions.
68-89: LGTM!The container existence check and image pull logic are correct. The subprocess calls use controlled inputs (hardcoded docker executable and CLI-provided arguments), so the S603 warnings are false positives for this dev/test script context.
91-120: LGTM!The container start command is well-constructed with appropriate flags (
--rmfor auto-cleanup,-dfor detached mode) and proper error handling.
122-159: LGTM!The readiness polling with timeout and cleanup logic is well-implemented. Using
healthcheck.sh --connect --innodb_initializedensures the database is truly ready for connections before returning success.taskfiles/test.yaml (5)
7-9: LGTM!Centralising the MariaDB credentials as global variables is a good practice for maintainability.
18-42: LGTM!Good implementation with UUID-based container naming and dynamic port allocation, enabling parallel test execution. The
deferensures cleanup even on test failure.
44-68: LGTM!Consistent pattern with the storage-specific tests. The implementation is correct.
206-227: LGTM!Good abstraction of the MariaDB lifecycle into an internal task. The script paths work correctly here since there's no
dir:directive.
77-113: No issue with script path resolution.The
sh:commands in task variables are evaluated from the repository root in Taskfile v3, regardless of thedir:setting. Thecpp-integrationtask usesdir: "{{.G_BUILD_SPIDER_DIR}}"with relative paths liketools/scripts/get_free_port.pyin variable evaluation, and this pattern is consistent across other tasks (spider-py-unit-tests,spider-py-storage-unit-tests) that successfully run in CI. The task executes correctly in the automated test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 128-129: The YAML key is misspelled: change the top-level key
"required:" to "requires:" in the test.yaml block that declares vars (the entry
containing required: vars: ["SPIDER_STORAGE_URL"]) so the task runtime
recognizes the variable requirement; keep the existing vars array unchanged and
ensure indentation remains the same.
- Around line 116-117: Fix the typo where the Taskfile key is written as
required: — update each executor task entry in taskfiles/test.yaml to use the
correct go-task key requires: instead of required:, e.g. change the block that
currently reads required: vars: ["SPIDER_STORAGE_URL"] (and the two other
executor tasks with the same pattern) to requires: vars: ["SPIDER_STORAGE_URL"];
verify and replace all occurrences so variable validation is actually applied.
- Around line 142-143: The YAML key is misspelled as "required:"; update it to
"requires:" so the task definition recognizes the dependency, i.e., replace the
line containing required: vars: ["SPIDER_STORAGE_URL"] with requires: vars:
["SPIDER_STORAGE_URL"] (ensure spacing/indentation matches surrounding task
YAML).
🧹 Nitpick comments (1)
docs/src/dev-docs/testing.md (1)
20-24: LGTM! Clear description of the unified test workflow.The new section accurately describes the consolidated tests.yaml workflow that runs both unit and integration tests. The description is concise and aligns with the PR's objective of moving to a unified test workflow.
Optional: Add context about workflow transition
If helpful for users familiar with the old workflow structure, consider mentioning that this replaces the previous
non-storage-unit-testsworkflow (as noted in the PR objectives). However, this is optional since forward-looking documentation may not need historical context.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/tests.yamldocs/src/dev-docs/testing.mdtaskfiles/test.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-03T01:49:27.764Z
Learnt from: sitaowang1998
Repo: y-scope/spider PR: 172
File: build-tasks.yaml:21-45
Timestamp: 2025-08-03T01:49:27.764Z
Learning: In the spider project, external tools like `uv`, Python, and Task are documented as requirements in README.md and are expected to be pre-installed by users rather than bootstrapped by the build system.
Applied to files:
.github/workflows/tests.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
- GitHub Check: lint
🔇 Additional comments (7)
.github/workflows/tests.yaml (2)
1-1: LGTM! Workflow and job renamed appropriately.The renaming from "unit-tests"/"non-storage-unit-tests" to "tests" accurately reflects the expanded scope now that storage-backed and integration tests are included.
Also applies to: 20-20
58-62: LGTM! Test steps updated to invoke the new unified tasks.The workflow now runs all tests (C++ unit tests, spider-py unit tests, and C++ integration tests) through the MariaDB-backed storage executor, aligning with the PR objectives.
taskfiles/test.yaml (3)
8-11: LGTM! MariaDB configuration variables for local testing.Hardcoded credentials are acceptable for local Docker-based testing. These are only used for ephemeral containers spun up during test runs.
68-105: LGTM! Well-structured storage executor pattern.The
mariadb-storage-executortask properly:
- Generates unique container names with cross-platform UUID normalization
- Dynamically allocates a free port to avoid conflicts
- Uses
deferfor guaranteed cleanup on success or failure- Passes the constructed
SPIDER_STORAGE_URLto the test task
14-30: LGTM! Test tasks properly delegate to the storage executor.All three public test tasks (
cpp-unit-tests,cpp-integration,spider-py-unit-tests) correctly invokemariadb-storage-executorwith their respective executor tasks.docs/src/dev-docs/testing.md (2)
26-26: LGTM! Link reference is correctly formatted.The markdown reference link to the workflow file is properly structured and points to the expected location.
7-11: Correct the task name fromtest:spider-py-storage-unit-teststotest:spider-py-unit-tests.The task in the documentation should match the actual task name in
taskfiles/test.yaml, which isspider-py-unit-tests(notspider-py-storage-unit-tests). The description is accurate—it does run spider-py unit tests using a storage backend—but the task identifier is incorrect.Likely an incorrect or invalid review comment.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a large refactoring on testing tasks. Please check.
| uv --version | ||
|
|
||
| - name: "Install project dependencies " | ||
| - name: "Install project dependencies" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove space.
| - run: "task test:spider-py-non-storage-unit-tests" | ||
| - run: "task test:spider-py-unit-tests" | ||
|
|
||
| - run: "task test:cpp-integration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group unit tests before integration tests.
| - |- | ||
| tools/scripts/mariadb/start.py \ | ||
| --name "{{.MARIADB_CONTAINER_NAME}}" \ | ||
| --port "{{.MARIADB_PORT}}" \ | ||
| --database "{{.G_MARIADB_DATABASE}}" \ | ||
| --username "{{.G_MARIADB_USERNAME}}" \ | ||
| --password "{{.G_MARIADB_PASSWORD}}" | ||
| - defer: |- | ||
| {{.ROOT_DIR}}/tools/scripts/mariadb/stop.py \ | ||
| --name "{{.MARIADB_CONTAINER_NAME}}" | ||
| - |- | ||
| tools/scripts/mariadb/wolf/init_db.py \ | ||
| --port "{{.MARIADB_PORT}}" \ | ||
| --database "{{.G_MARIADB_DATABASE}}" \ | ||
| --username "{{.G_MARIADB_USERNAME}}" \ | ||
| --password "{{.G_MARIADB_PASSWORD}}" | ||
| - task: "{{.TEST_TASK}}" | ||
| vars: | ||
| SPIDER_STORAGE_URL: | ||
| "jdbc:mariadb://127.0.0.1:{{.MARIADB_PORT}}/{{.G_MARIADB_DATABASE}}?\ | ||
| user={{.G_MARIADB_USERNAME}}&password={{.G_MARIADB_PASSWORD}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed start task. In this way, the defer can actually work even if init_db fails.
| cpp-non-storage-unit-tests: | ||
| deps: | ||
| - "build-unit-test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove xxx-storage-unit-tests and xxx-non-storage-unit-tests.
| # | ||
| # @param {string} STORAGE_TASK The task to execute. The task must accept no parameters other than | ||
| # `SPIDER_STORAGE_URL`, which is set to the MariaDB instance URL. | ||
| mariadb-storage-task-executor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a wrapper for storage task exeuction so we don't need to repeat the save set up everywhere.
The storage URL is passed as a variable.
| # A generic wrapper that runs the given task with a MariaDB storage backend. | ||
| # | ||
| # @param {string} STORAGE_TASK The task to execute. The task must accept no parameters other than | ||
| # `SPIDER_STORAGE_URL`, which is set to the MariaDB instance URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add docstrings for unit tests that contains variables.
| # Internal task that runs all Spider Wolf's unit tests. | ||
| # | ||
| # @param {string} SPIDER_STORAGE_URL An URL pointing to the MariaDB instance. | ||
| spider-wolf-unit-tests-executor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document the versioning. But since this is an internal task, I'm gonna refer to wolf directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 115-116: The Taskfile key is misspelled: replace the top-level
"required:" mapping with "requires:" so Taskfile v3 enforces the variable check;
specifically change the block that currently reads `required: vars:
["SPIDER_STORAGE_URL"]` to `requires: vars: ["SPIDER_STORAGE_URL"]` so the
SPIDER_STORAGE_URL check is not silently ignored.
- Around line 141-142: There's a typo in the task spec: the key "required"
should be "requires"; update the task block that currently reads with vars:
["SPIDER_STORAGE_URL"] to use the "requires" key instead of "required" so the
dependency is recognized (i.e., replace the "required:" key with "requires:"
keeping the existing vars list).
- Around line 127-128: In taskfiles/test.yaml the task key uses the wrong
property name "required" for declaring environment variables; replace the
"required" key with the correct "requires" key so the vars mapping
("SPIDER_STORAGE_URL") is recognized (i.e., change the "required: vars: [...]"
entry to "requires: vars: [...]" in the same block).
- Around line 14-30: Calls to mariadb-storage-task-executor set TEST_TASK but
the executor expects STORAGE_TASK, causing missing-variable failures; update the
callers (the cpp-unit-tests, cpp-integration, and spider-py-unit-tests task
entries) to pass STORAGE_TASK instead of TEST_TASK so the variable name matches
mariadb-storage-task-executor, or alternatively change
mariadb-storage-task-executor to accept TEST_TASK (rename expected variable
STORAGE_TASK → TEST_TASK) so both sides use the same identifier.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
taskfiles/test.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (ubuntu-24.04)
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: lint
🔇 Additional comments (3)
taskfiles/test.yaml (3)
8-11: LGTM!Hardcoded credentials are acceptable here since they're used for ephemeral test containers only.
68-104: Well-structured executor with proper cleanup.Good use of:
- UUID-based container naming for parallel test isolation
deferto ensure MariaDB container cleanup on failure- Cross-platform UUID normalization
144-159: LGTM!Build dependencies and test execution commands look correct. The venv activation followed by pytest invocation is properly structured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 14-30: Public task entries set TEST_TASK but the
mariadb-storage-task-executor expects STORAGE_TASK, causing a missing-variable
failure; update the affected task definitions (cpp-unit-tests, cpp-integration,
spider-py-unit-tests) to pass STORAGE_TASK instead of TEST_TASK, or
alternatively adapt mariadb-storage-task-executor to accept TEST_TASK and map it
to STORAGE_TASK, making sure the symbol STORAGE_TASK used in the executor
(references around lines where STORAGE_TASK is read) is provided.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
taskfiles/test.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (5)
taskfiles/test.yaml (5)
8-11: LGTM!Hardcoded credentials for ephemeral local Docker test containers are acceptable. The values are clearly test-specific and self-documenting.
68-104: LGTM!Well-structured executor with good practices:
- UUID case normalization handles cross-platform differences.
- Dynamic port allocation avoids conflicts.
- Deferred stop command ensures container cleanup even on test failure.
- JDBC URL construction is correct for MariaDB.
106-118: LGTM!Clean implementation with
PYTHONDONTWRITEBYTECODEto avoid polluting the source tree with bytecode files.
120-132: LGTM!Correct dependency on
build-unit-testand proper environment variable propagation.
134-159: LGTM!Comprehensive build targets for integration tests and proper venv activation before running pytest.
Co-authored-by: Lin Zhihao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 100-105: The SPIDER_STORAGE_URL currently embeds raw credentials
which risks leakage and breaks on special chars; update the task vars so the
username/password are URL-encoded before composing SPIDER_STORAGE_URL (use a
URL-encoding routine, e.g., quote/quote_plus) and compose the JDBC string from
the encoded components for the SPIDER_STORAGE_URL variable used by
spider-py-unit-tests-executor, spider-wolf-unit-tests-executor and
spider-wolf-integration-tests-executor; additionally ensure the full JDBC string
is not emitted to test logs (avoid printing it in tests, mark the env as
secret/masked in the CI runner or pass credentials via masked/secret env vars
and/or pass only non-sensitive components to pytest to prevent accidental
logging).
- Around line 75-79: The MARIADB_CONTAINER_NAME sh command uses unix-only tools
(uuidgen, tr, sed) which will fail on Windows; replace the shell pipeline with a
cross-platform Python-based generator and update the MARIADB_CONTAINER_NAME
entry to call it (e.g., point sh to a new or existing script similar to
tools/scripts/get_free_port.py), or invoke python -c to generate a lowercase
UUID prefixed with "spider-mariadb-"; ensure the replacement script/function
exports the same string format so MARIADB_CONTAINER_NAME continues to work
across platforms.
In @tools/scripts/mariadb/start.py:
- Around line 91-109: Add a new CLI option and use it for the MariaDB root
account instead of reusing args.password: introduce an argparse
add_argument("--root-password", default="<distinct-default>", help="password for
MariaDB root user") to populate args.root_password, update the mariadb_start_cmd
list to set "-e", f"MARIADB_ROOT_PASSWORD={args.root_password}" (replacing the
existing args.password usage), and update any docs/help text or tests that
assume a single password.
- Line 13: Replace the floating image tag in _MARIADB_IMAGE (currently
"mariadb:latest") with a pinned tag or digest (e.g., "mariadb:11.4" or a
specific sha256 digest) to ensure CI reproducibility, and update the two log
messages that mention "latest" (the messages at the locations that currently
print "latest") to reflect the new pinned tag or show the value of
_MARIADB_IMAGE so they remain accurate.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
taskfiles/test.yamltools/scripts/mariadb/start.py
🧰 Additional context used
🧬 Code graph analysis (1)
tools/scripts/mariadb/start.py (1)
tools/scripts/mariadb/stop.py (1)
main(20-55)
🪛 Ruff (0.14.10)
tools/scripts/mariadb/start.py
68-68: subprocess call: check for execution of untrusted input
(S603)
80-80: subprocess call: check for execution of untrusted input
(S603)
111-111: subprocess call: check for execution of untrusted input
(S603)
124-124: subprocess call: check for execution of untrusted input
(S603)
154-154: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
- GitHub Check: lint
🔇 Additional comments (2)
taskfiles/test.yaml (1)
14-31: Nice test isolation via executor wrapper (start/init/run/stop).
The lifecycle ordering (includingdeferstop) looks clean and should reduce CI duplication.Also applies to: 68-105
tools/scripts/mariadb/start.py (1)
122-160: Thehealthcheck.shscript is included in the official MariaDB Docker image and supports the--connect --innodb_initializedflags being used here, so the concern about it potentially missing is unfounded. However, addingdocker logsoutput before cleanup would still improve debugging for cases where the healthcheck fails for other reasons. Consider capturing and logging the container logs on timeout to aid diagnosis of startup failures in CI environments.Likely an incorrect or invalid review comment.
Description
This PR:
SPIDER_STORAGE_URLinstead of hardcoded urlspider-pyREADME and test doc.Note
Need to change the required workflow for PR from
non-storage-unit-teststotests.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Enhancements
Documentation
Removed
✏️ Tip: You can customize this high-level summary in your review settings.