Skip to content

Conversation

@sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Jan 8, 2026

Description

This PR:

  • Adds scripts to start, wait, initialize and stop MariaDB Docker container.
  • Adds storage creation task that uses the above scripts.
  • Updates storage-related test tasks to add storage-creation task and storage cleanup.
  • Updates following tests to use SPIDER_STORAGE_URL instead of hardcoded url
    • C++ unit tests
    • C++ integration tests
    • Python unit tests
  • Updates GitHub workflow to run all tests.
  • Updates spider-py README and test doc.

Note

Need to change the required workflow for PR from non-storage-unit-tests to tests.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • All test tasks pass.
  • GitHub workflows pass.

Summary by CodeRabbit

  • New Features

    • MariaDB test tooling: start/stop/init scripts and a free-port helper; added C++ integration test task.
  • Enhancements

    • Unified CI test workflow and test tasks routed through a MariaDB-backed executor.
    • Tests and helpers now read storage URL at runtime via SPIDER_STORAGE_URL environment variable.
  • Documentation

    • Testing guide simplified and updated to reference the consolidated test workflow.
  • Removed

    • Legacy SQL initialization script deleted.

✏️ Tip: You can customize this high-level summary in your review settings.


@sitaowang1998 sitaowang1998 requested a review from a team as a code owner January 8, 2026 04:40
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 8, 2026

Walkthrough

CI and local test flows now support environment-driven MariaDB: added start/stop container scripts and a programmatic DB initializer, tests and helpers read SPIDER_STORAGE_URL (fallbacks preserved), taskfiles/CI wired to run tests via a mariadb-storage executor, lint scope extended, and the old SQL file removed.

Changes

Cohort / File(s) Summary
GitHub Workflow
/.github/workflows/tests.yaml
Workflow renamed unit-teststests; job id updated; step names trimmed; swapped non-storage test steps for storage-aware steps and added test:cpp-integration.
Task Configuration
taskfiles/test.yaml
Added G_MARIADB_* globals; introduced mariadb-storage-task-executor plus per-test executors (cpp-unit-tests, cpp-integration, spider-py-unit-tests) to lifecycle: start MariaDB, init DB, inject SPIDER_STORAGE_URL, run tests, stop MariaDB.
Linting Configuration
taskfiles/lint.yaml
Included tools/scripts/mariadb in py lint loop (mypy/ruff).
Python Test Fixtures
python/spider-py/tests/client/test_driver.py, python/spider-py/tests/storage/test_mariadb.py
Fixtures now read SPIDER_STORAGE_URL env var (falling back to previous defaults) when constructing driver/storage.
Integration Tests & Helpers
tests/integration/client.py, tests/integration/test_*.py
Renamed g_storage_urlG_STORAGE_URL; added get_storage_url() (returns env SPIDER_STORAGE_URL or default); tests updated to call get_storage_url() and import storage.
C++ Storage Helper
tests/storage/StorageTestHelper.hpp
cMySqlStorageUrlconstexpr std::string_view; added get_storage_url() (reads SPIDER_STORAGE_URL via Boost.Process); create_storage_factory() uses get_storage_url().
MariaDB Container Management
tools/scripts/mariadb/start.py, tools/scripts/mariadb/stop.py
New CLI scripts to start (pull/run, healthcheck polling, timeout) and stop MariaDB containers; include args, logging, exit codes, and cleanup.
MariaDB DB Init (programmatic)
tools/scripts/mariadb/wolf/init_db.py
New Python initializer executing CREATE TABLE statements for Spider schema (replaces deleted SQL file).
Utility
tools/scripts/get_free_port.py
New helper to find an available TCP port for dynamic container binding.
Removed SQL Schema
tools/scripts/storage/init_db.sql
Deleted SQL initialization script; schema creation moved to Python initializer.
Docs / README
docs/src/dev-docs/testing.md, python/spider-py/README.md
Removed manual storage setup and storage-specific test instructions; updated workflow references and simplified testing docs.

Sequence Diagram(s)

sequenceDiagram
  participant TaskRunner as Task (taskfiles/test.yaml)
  participant Executor as mariadb-storage-executor
  participant StartScript as tools/scripts/mariadb/start.py
  participant Docker as Docker Engine
  participant InitDB as tools/scripts/mariadb/wolf/init_db.py
  participant Tests as Test Runner
  participant StopScript as tools/scripts/mariadb/stop.py

  TaskRunner->>Executor: invoke executor for TEST_TASK
  Executor->>StartScript: start container (name, port, creds)
  StartScript->>Docker: docker pull & run container
  StartScript->>Docker: docker exec healthcheck (poll until ready)
  StartScript-->>Executor: return host:port and success
  Executor->>InitDB: run init_db.py with DB params
  InitDB-->>Executor: return success after creating schema
  Executor->>Tests: run TEST_TASK with SPIDER_STORAGE_URL env
  Executor->>StopScript: request stop container on completion
  StopScript->>Docker: docker stop/remove container
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main changes: adding MariaDB docker tasks and storage-related tests to GitHub workflows, which aligns with the core objectives of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 91.30% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @tools/scripts/storage/init_db.py:
- Around line 15-231: The CREATE TABLE statements incorrectly use ON UPDATE
CURRENT_TIMESTAMP for columns that should be immutable after creation; remove
the ON UPDATE CURRENT_TIMESTAMP clause from `creation_time` in the jobs table
(jobs -> `creation_time`), from `start_time` in the task_instances table
(task_instances -> `start_time`), and from `lease_time` in the scheduler_leases
table (scheduler_leases -> `lease_time`) while keeping DEFAULT CURRENT_TIMESTAMP
so the timestamp is set at insertion; leave the `heartbeat` column (drivers ->
`heartbeat`) as-is with ON UPDATE CURRENT_TIMESTAMP.

In @tools/scripts/storage/start.py:
- Line 12: The _MARIADB_IMAGE constant is pinned to "mariadb:latest", which is
non-reproducible; change _MARIADB_IMAGE to a specific, supported MariaDB tag
(e.g., "mariadb:10.11" or another project-approved version) or make it
configurable (read from an environment variable or script argument with a stable
default) so builds are deterministic; update any documentation or defaults to
reflect the chosen pinned version.

In @tools/scripts/storage/stop.py:
- Line 25: The argparse parser description contains a typo: change the string
passed to the ArgumentParser constructor (the description argument used when
creating parser) from "Stop MairaDB Docker container." to "Stop MariaDB Docker
container." so the CLI help shows the correct database name; update the
description in the parser instantiation where parser =
argparse.ArgumentParser(...) is defined.
- Around line 44-48: The variable name localstack_stop_cmd is misleading because
it targets a MariaDB container; rename it to mariadb_stop_cmd wherever defined
and referenced (the list currently built with "docker", "stop", args.name) so
the variable name matches its purpose—update the declaration and any subsequent
use of localstack_stop_cmd to mariadb_stop_cmd in the stop.py script.
🧹 Nitpick comments (5)
.github/workflows/tests.yaml (1)

1-1: Consider updating the workflow name to reflect broader test scope.

The workflow name "unit-tests" no longer accurately reflects its scope, as it now runs integration tests (line 60). Consider renaming to "tests" or "all-tests" for clarity.

📝 Suggested rename
-name: "unit-tests"
+name: "tests"
tools/scripts/storage/get_free_port.py (1)

24-25: Remove the unused noqa directive.

The static analysis tool correctly identifies that the noqa: T201 directive is unnecessary because the T201 rule (print statement detection) is not enabled in your Ruff configuration.

♻️ Proposed fix
-    # ruff: noqa: T201
     print(port)
tools/scripts/storage/init_db.py (3)

97-97: Consider rationalizing VARBINARY and VARCHAR size limits.

The value size limit of 999 bytes appears in multiple columns (data.value, task_outputs.type, task_outputs.value, task_inputs.type, task_inputs.value, client_kv_data.value, task_kv_data.value). This specific limit seems arbitrary.

Common choices are powers of 2 (1024, 2048) or clean decimal values (1000). Consider whether 999 was chosen for a specific reason or if it should be normalized.

Also applies to: 108-109, 123-123, 214-215, 224-224


271-284: Add error handling and success logging.

The script lacks error handling and logging for the database initialization process. If table creation fails, the error won't be clearly communicated to the user.

♻️ Proposed improvement
+    logger.info("Connecting to MariaDB at 127.0.0.1:%d...", args.port)
     with (
         mariadb.connect(
             host="127.0.0.1",
             port=args.port,
             user=args.username,
             password=args.password,
             database=args.database,
         ) as conn,
         conn.cursor() as cursor,
     ):
+        logger.info("Creating database tables...")
         for table_creator in _TABLE_CREATORS:
             cursor.execute(table_creator)
         conn.commit()
+        logger.info("Database initialization completed successfully.")
 
     return 0

273-273: Hardcoded host limits flexibility.

The host is hardcoded to "127.0.0.1". If MariaDB needs to run on a different host (e.g., Docker host networking scenarios), this won't be configurable.

Consider adding a --host argument with "127.0.0.1" as the default, similar to the other connection parameters.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1865df and 62d9add.

📒 Files selected for processing (16)
  • .github/workflows/tests.yaml
  • python/spider-py/tests/client/test_driver.py
  • python/spider-py/tests/storage/test_mariadb.py
  • taskfiles/lint.yaml
  • taskfiles/test.yaml
  • tests/integration/client.py
  • tests/integration/test_client.py
  • tests/integration/test_scheduler_worker.py
  • tests/integration/test_signal.py
  • tests/storage/StorageTestHelper.hpp
  • tools/scripts/storage/get_free_port.py
  • tools/scripts/storage/init_db.py
  • tools/scripts/storage/init_db.sql
  • tools/scripts/storage/start.py
  • tools/scripts/storage/stop.py
  • tools/scripts/storage/wait_for_db.py
💤 Files with no reviewable changes (1)
  • tools/scripts/storage/init_db.sql
🧰 Additional context used
🧬 Code graph analysis (5)
tools/scripts/storage/start.py (2)
tools/scripts/storage/init_db.py (1)
  • main (242-285)
tools/scripts/storage/wait_for_db.py (1)
  • main (21-65)
tools/scripts/storage/wait_for_db.py (2)
tools/scripts/storage/init_db.py (1)
  • main (242-285)
tools/scripts/storage/start.py (1)
  • main (22-114)
python/spider-py/tests/client/test_driver.py (1)
python/spider-py/src/spider_py/client/driver.py (1)
  • Driver (19-60)
python/spider-py/tests/storage/test_mariadb.py (1)
python/spider-py/src/spider_py/storage/jdbc_url.py (1)
  • parse_jdbc_url (22-55)
tests/integration/test_client.py (1)
tests/integration/client.py (2)
  • get_storage_url (116-121)
  • storage (125-134)
🪛 Ruff (0.14.10)
tools/scripts/storage/start.py

61-61: subprocess call: check for execution of untrusted input

(S603)


73-73: subprocess call: check for execution of untrusted input

(S603)


104-104: subprocess call: check for execution of untrusted input

(S603)

tools/scripts/storage/wait_for_db.py

45-45: subprocess call: check for execution of untrusted input

(S603)

tools/scripts/storage/stop.py

34-34: subprocess call: check for execution of untrusted input

(S603)


50-50: subprocess call: check for execution of untrusted input

(S603)

tools/scripts/storage/get_free_port.py

24-24: Unused noqa directive (non-enabled: T201)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (ubuntu-22.04)
  • GitHub Check: tests (ubuntu-24.04)
  • GitHub Check: lint
🔇 Additional comments (27)
.github/workflows/tests.yaml (2)

20-20: LGTM!

The job rename from non-storage-unit-tests to tests appropriately reflects the expanded test scope that now includes storage-related tests.


58-62: No changes required. The test tasks (test:cpp-unit-tests, test:cpp-integration-tests, test:spider-py-unit-tests) are self-contained and properly handle the full MariaDB lifecycle:

  • Storage setup: Each task calls the start-storage subtask, which starts the MariaDB container, waits for readiness, and initialises the database. The SPIDER_STORAGE_URL environment variable is set correctly.
  • Cleanup on failure: The defer: keyword guarantees cleanup runs even if tests fail, preventing resource leaks.
  • CI compatibility: Dynamic port allocation via get_free_port.py and unique container naming via uuidgen ensure portability across environments including GitHub Actions.
python/spider-py/tests/client/test_driver.py (1)

16-17: LGTM!

The environment-driven storage URL configuration is correctly implemented with an appropriate fallback to the default test URL.

python/spider-py/tests/storage/test_mariadb.py (1)

19-20: LGTM!

The fixture correctly retrieves the storage URL from the environment variable and parses it for MariaDB connection parameters.

tools/scripts/storage/wait_for_db.py (1)

21-66: LGTM!

The wait logic correctly polls the MariaDB container health status with appropriate timeout handling and sleep intervals between attempts.

Note: The static analysis warning (S603) about untrusted subprocess input is a false positive in this context, as both docker_executable and args.name are controlled inputs.

tests/storage/StorageTestHelper.hpp (3)

17-17: Good refactoring: constexpr string_view for constant URL.

Changing from std::string const to constexpr std::string_view is more efficient and appropriate for a compile-time constant.


24-32: LGTM!

The get_storage_url() function correctly reads the SPIDER_STORAGE_URL environment variable and falls back to the default value when not set.


36-37: LGTM!

The factory creation now uses the environment-driven storage URL, aligning with the broader refactoring to support dynamic storage configuration.

taskfiles/lint.yaml (1)

139-146: LGTM!

Extending linting to cover the newly introduced storage scripts is appropriate and maintains consistency with the existing linting configuration.

tests/integration/test_signal.py (2)

15-27: LGTM!

The import changes correctly align with the refactored public API in tests/integration/client.py, switching from direct constant access to the new get_storage_url() function.


87-87: LGTM!

The fixture correctly uses get_storage_url() to obtain the runtime-configured storage URL instead of the previous hardcoded constant.

tests/integration/test_scheduler_worker.py (2)

13-30: LGTM!

The import changes correctly reflect the refactored API, bringing in get_storage_url() and storage from the updated integration.client module.


93-93: LGTM!

The fixture correctly uses get_storage_url() for runtime storage URL configuration, consistent with the environment-driven approach introduced in this PR.

tools/scripts/storage/get_free_port.py (1)

11-18: Be aware of the inherent TOCTOU race condition.

The approach of binding to port 0 to obtain a free port has a time-of-check-time-of-use (TOCTOU) race condition: between the time this script returns the port number and when the MariaDB container actually binds to it, another process could claim the port. This is an inherent limitation of this port allocation pattern and is generally acceptable for test scenarios where the window is small.

If port conflicts occur in CI/CD environments, consider implementing retry logic in the MariaDB startup script or using Docker's dynamic port allocation features instead.

tests/integration/client.py (3)

114-114: LGTM!

Renaming to uppercase G_STORAGE_URL follows Python naming conventions for module-level constants. The hardcoded credentials in the default URL are acceptable for test scenarios.


116-121: LGTM!

The new get_storage_url() function provides a clean abstraction for environment-driven storage URL configuration, falling back to the default when SPIDER_STORAGE_URL is not set.


132-132: LGTM!

The fixture correctly uses get_storage_url() to support runtime storage configuration via the environment variable.

tools/scripts/storage/init_db.py (1)

1-14: LGTM! Clean script setup.

The shebang, dependency declaration, and imports are well-structured. Using uv run --script with inline dependencies is an excellent choice for this utility script.

tools/scripts/storage/start.py (3)

1-20: LGTM! Clean script structure.

The imports, logging configuration, and image constant are well-organized.


61-69: Clarify the return code when container already exists.

Returning 1 when the container already exists (line 69) typically indicates an error, but this is more of a no-op situation. Consider whether this should return 0 for success or use a different approach.

The current behavior might be intentional if calling scripts need to distinguish between "started fresh" and "already running". Please verify the expected behavior in the task orchestration context.


57-66: Static analysis warnings are false positives.

The Ruff S603 warnings about untrusted input in subprocess calls are false positives. The docker_executable is a hardcoded string (line 59), and all other arguments are either literals or validated by argparse.

Also applies to: 73-82, 104-114

tests/integration/test_client.py (1)

10-10: LGTM! Proper migration to environment-driven storage URL.

The changes correctly replace the hardcoded g_storage_url global with the get_storage_url() accessor function. This enables runtime configuration via the SPIDER_STORAGE_URL environment variable, which aligns with the MariaDB Docker lifecycle management introduced in this PR.

The storage fixture import on line 10 provides proper lifecycle management through pytest's fixture dependency mechanism.

Also applies to: 66-66, 87-87

taskfiles/test.yaml (5)

7-9: LGTM! Clean configuration variables.

The MariaDB configuration is properly centralized in global variables, making it easy to adjust defaults across all test tasks.


24-26: Excellent cross-platform UUID normalization.

The UUID generation and normalization logic handles the case difference between macOS (uppercase) and Linux (lowercase), ensuring consistent container naming across platforms. The spider-mariadb- prefix provides clear identification.

Also applies to: 48-50, 89-91, 145-147, 178-180


27-28: Be aware of potential port allocation race condition.

Dynamic port allocation via get_free_port.py is a good approach, but there's a time-of-check-to-time-of-use (TOCTOU) race condition: between obtaining the free port and starting the container, another process could bind to that port.

This is unlikely in typical CI environments but could occur in heavily loaded systems or when running multiple test suites concurrently.

Consider whether the test infrastructure needs additional retry logic or port conflict handling. In practice, this may be acceptable given the short time window and typical usage patterns.

Also applies to: 51-52, 92-93, 148-149, 181-182


196-220: LGTM! Well-structured storage lifecycle management.

The start-storage internal task properly orchestrates the three-step process:

  1. Start the MariaDB container
  2. Wait for readiness
  3. Initialize the database schema

The sequential execution ensures each step completes before the next begins, and the internal: true flag appropriately restricts this task to internal use only.


37-39: No action needed. The stop.py script already handles non-existent containers gracefully by checking container existence with docker inspect (line 35) and exiting with code 0 while logging a warning message if the container doesn't exist or isn't running (lines 40-42). The defer blocks will safely execute even if start-storage fails, as the stop script is designed to handle this scenario without producing confusing error output.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @docs/src/dev-docs/testing.md:
- Around line 26-27: The sentence referencing the GitHub workflow
"[tests.yaml][gh-workflow-tests]" contains a grammatical error—remove the
redundant "the" so it reads "runs all unit tests and integration tests"; update
the sentence in the docs text where the phrase "runs the all unit tests and
integration tests" appears to "runs all unit tests and integration tests" to
correct the grammar.
🧹 Nitpick comments (4)
tests/integration/test_signal.py (1)

21-21: Remove unnecessary noqa directive.

The # noqa: F401 directive is unnecessary here. Ruff understands that storage is a pytest fixture and won't flag it as an unused import.

♻️ Proposed fix
-    storage,  # noqa: F401
+    storage,
tests/integration/test_scheduler_worker.py (1)

24-24: Remove unnecessary noqa directive.

The # noqa: F401 directive is unnecessary here. Ruff understands that storage is a pytest fixture and won't flag it as an unused import.

♻️ Proposed fix
-    storage,  # noqa: F401
+    storage,
tests/integration/test_client.py (1)

10-10: Remove unnecessary noqa directive.

The # noqa: F401 directive is unnecessary here. Ruff understands that storage is a pytest fixture and won't flag it as an unused import.

♻️ Proposed fix
-from integration.client import get_storage_url, SQLConnection, storage  # noqa: F401
+from integration.client import get_storage_url, SQLConnection, storage
tools/scripts/storage/init_db.py (1)

242-286: Consider adding explicit error handling for better diagnostics.

The connection and execution logic is correct and uses proper context management. However, there's no explicit error handling—any connection failure or SQL execution error will raise an unhandled exception. While fail-fast behaviour may be intentional for a setup script, wrapping the connection block in a try/except with structured logging would improve diagnostics.

♻️ Optional: Add error handling
def main() -> int:
    """Main."""
    parser = argparse.ArgumentParser(description="Initialize the database tables for Spider.")
    # ... argument parsing ...
    args = parser.parse_args()

    try:
        with (
            mariadb.connect(
                host="127.0.0.1",
                port=args.port,
                user=args.username,
                password=args.password,
                database=args.database,
            ) as conn,
            conn.cursor() as cursor,
        ):
            for table_creator in _TABLE_CREATORS:
                cursor.execute(table_creator)
            conn.commit()
        logger.info("Database tables initialized successfully.")
        return 0
    except mariadb.Error as e:
        logger.error("Failed to initialize database: %s", e)
        return 1
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62d9add and 55f7300.

📒 Files selected for processing (9)
  • .github/workflows/tests.yaml
  • docs/src/dev-docs/testing.md
  • python/spider-py/README.md
  • tests/integration/client.py
  • tests/integration/test_client.py
  • tests/integration/test_scheduler_worker.py
  • tests/integration/test_signal.py
  • tools/scripts/storage/init_db.py
  • tools/scripts/storage/stop.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/integration/client.py
  • .github/workflows/tests.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-09T17:15:24.552Z
Learnt from: davidlion
Repo: y-scope/spider PR: 100
File: src/spider/worker/worker.cpp:205-230
Timestamp: 2025-04-09T17:15:24.552Z
Learning: Documentation should be added to new functions in the spider codebase, as already discussed with the user.

Applied to files:

  • python/spider-py/README.md
🧬 Code graph analysis (4)
tools/scripts/storage/init_db.py (2)
tools/scripts/storage/start.py (1)
  • main (22-114)
tools/scripts/storage/wait_for_db.py (1)
  • main (21-65)
tests/integration/test_client.py (1)
tests/integration/client.py (2)
  • get_storage_url (118-123)
  • storage (127-136)
tests/integration/test_scheduler_worker.py (1)
tests/integration/client.py (2)
  • get_storage_url (118-123)
  • storage (127-136)
tests/integration/test_signal.py (1)
tests/integration/client.py (5)
  • get_storage_url (118-123)
  • get_task_outputs (198-223)
  • get_task_state (226-240)
  • remove_job (243-253)
  • storage (127-136)
🪛 Ruff (0.14.10)
tests/integration/test_client.py

10-10: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tests/integration/test_scheduler_worker.py

24-24: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tests/integration/test_signal.py

21-21: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tools/scripts/storage/stop.py

34-34: subprocess call: check for execution of untrusted input

(S603)


50-50: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint
  • GitHub Check: tests (ubuntu-24.04)
  • GitHub Check: tests (ubuntu-22.04)
🔇 Additional comments (10)
python/spider-py/README.md (1)

27-33: Verify that the simplified Testing section adequately documents the new storage-enabled test workflow.

The Testing section has been significantly simplified to a single task command. However, the PR introduces a new SPIDER_STORAGE_URL environment variable and MariaDB-backed storage integration. The README no longer documents:

  • How developers should configure or set up storage for local testing (e.g., whether storage-creation task runs automatically or requires manual invocation)
  • The role of the SPIDER_STORAGE_URL environment variable in the test workflow
  • Whether developers need specific prerequisites (e.g., Docker, MariaDB) installed locally
  • How the unified test:spider-py-unit-tests task interacts with the storage setup introduced in this PR

Consider whether the README should include additional context about storage configuration for developers setting up tests locally, or link to more detailed testing documentation if it exists elsewhere.

tests/integration/test_signal.py (2)

76-76: LGTM! Environment-driven storage configuration.

The fixture correctly uses get_storage_url() to retrieve the storage URL from the environment variable, enabling flexible test configuration.

Also applies to: 87-89


103-103: LGTM! Consistent storage fixture usage.

The test methods correctly use the storage fixture with appropriate noqa annotations for parameter redefinition.

Also applies to: 180-180

tests/integration/test_scheduler_worker.py (2)

82-82: LGTM! Environment-driven storage configuration.

The fixture correctly uses get_storage_url() to retrieve the storage URL from the environment variable, enabling flexible test configuration.

Also applies to: 93-93


284-284: LGTM! Consistent storage fixture usage.

The test methods correctly use the storage fixture with appropriate noqa annotations for parameter redefinition.

Also applies to: 314-314, 331-331, 351-351

tests/integration/test_client.py (2)

55-55: LGTM! Environment-driven storage configuration.

The fixture correctly uses get_storage_url() to retrieve the storage URL from the environment variable, enabling flexible test configuration.

Also applies to: 66-66


87-87: LGTM! Consistent storage URL usage.

The test correctly passes the environment-driven storage URL to the client_test program.

tools/scripts/storage/stop.py (2)

44-55: Stop logic is correct.

The container stop logic properly handles errors and logs appropriate messages. Past review comments about the variable name have been addressed (mariadb_stop_cmd is correctly used).


20-42: Use the docker_executable variable consistently.

The docker_executable variable is defined on line 23 with a comment stating it's to silence Ruff S607, but line 35 uses the hardcoded string "docker" instead of the variable, making the variable and comment misleading.

🔧 Proposed fix
     result = subprocess.run(
-        [docker_executable, "inspect", "-f", "{{.State.Running}}", args.name],
+        [docker_executable, "inspect", "-f", "{{.State.Running}}", args.name],
         capture_output=True,
         text=True,
         check=False,
     )

Note: Line 35 already shows docker_executable in the diff above, but verify line 45 also uses docker_executable instead of "docker".

Likely an incorrect or invalid review comment.

tools/scripts/storage/init_db.py (1)

15-231: Schema looks solid; past timestamp issues resolved.

The SQL schema is comprehensive and correctly configured. Past review comments regarding incorrect ON UPDATE CURRENT_TIMESTAMP usage on creation_time, start_time, and lease_time have been addressed—these columns now correctly use only DEFAULT CURRENT_TIMESTAMP. The heartbeat column appropriately retains ON UPDATE CURRENT_TIMESTAMP for auto-updates.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
taskfiles/test.yaml (3)

7-9: Consider documenting that these are test-only credentials.

While the hardcoded credentials are acceptable for local testing, consider adding a comment clarifying that these are for development/testing purposes only and should not be used in production environments.


111-113: Simplify pytest invocation.

Line 112 activates the virtual environment, but line 113 uses the full path to pytest, making the activation redundant.

♻️ Simplify to use pytest directly
-      - |-
-        . ../test-venv/bin/activate
-        ../test-venv/bin/pytest tests/integration
+      - "../test-venv/bin/pytest tests/integration"

18-204: Consider reducing code duplication across storage-dependent tasks.

The MariaDB setup pattern (container name generation, port allocation, storage URL construction, start-storage invocation, and defer cleanup) is duplicated across five tasks. This creates maintenance burden—changes to the storage lifecycle require updates in multiple places.

While YAML anchors have limitations with Taskfile's variable interpolation, consider whether a wrapper task or shared configuration could reduce this duplication in a future refactor.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b3fa91 and f03981e.

📒 Files selected for processing (1)
  • taskfiles/test.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint
  • GitHub Check: tests (ubuntu-22.04)
  • GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (3)
taskfiles/test.yaml (3)

44-68: Clarify the test task strategy.

The presence of three tasks (cpp-unit-tests, cpp-storage-unit-tests, cpp-non-storage-unit-tests) creates ambiguity about which task developers should use. Consider documenting the intended workflow:

  • Is cpp-unit-tests the primary entry point for running all tests?
  • When should developers use the split versions?

206-230: Well-designed internal task for storage lifecycle.

The start-storage task properly encapsulates the three-phase MariaDB setup (start → wait → init) with sensible defaults. Making it internal ensures it's only invoked by other tasks.


18-42: Well-structured storage test lifecycle.

The task properly implements the MariaDB container lifecycle with dynamic port allocation, UUID-based container naming, and cleanup via defer. The cross-platform UUID normalization is a good defensive measure. All referenced scripts and variables are properly defined and in place.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the scripts. Can we restructure the tool scripts as the following:

tool/scripts/
  get_free_port.py
  mariadb/
    start.py
    stop.py
    wolf/
      init.py
    huntsman/
      init.py
  • get_free_port.py seems to be a general helper
  • Using mariadb directly instead of storage, in case we need other storage backend in the future
  • Move init scripts into the dedicated namespace

This should make the structure cleaner.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 25-31: The MARIADB_PORT var (and other similar vars using
tools/scripts/get_free_port.py) uses a relative path that breaks when tasks run
from non-root directories; update the sh value for MARIADB_PORT and the other
instances (the vars/commands at the other noted blocks) to invoke the script via
the repository root variable (i.e., prefix with {{.ROOT_DIR}}) so the path
resolves regardless of task dir:, ensuring you change every occurrence that
currently is "tools/scripts/get_free_port.py".
- Around line 21-25: The SPIDER_STORAGE_URL (and the same patterns at the other
occurrences) currently interpolates G_MARIADB_USERNAME and G_MARIADB_PASSWORD
directly into the JDBC URL which will break if credentials contain characters
like @ or &; update the template to percent-encode (URL-encode)
G_MARIADB_USERNAME and G_MARIADB_PASSWORD before embedding them into
SPIDER_STORAGE_URL (or, alternatively, move user and password out of the query
string and supply them as separate JDBC connection properties), and apply the
same fix to the other identical URL templates referenced (the blocks around
lines 47–51, 90–94, 149–152, 184–187) so all credential interpolations are
safely encoded or passed as properties.

In @tools/scripts/get_free_port.py:
- Around line 21-26: The inline Ruff suppression "# ruff: noqa: T201" in the
main function is unused and should be removed; edit the main() implementation
(which calls get_free_port() and prints the port) to delete that comment so Ruff
will report T201 as intended, or alternatively keep the comment but instead add
T201 to your project's Ruff config to explicitly allow it—pick one approach and
apply it consistently.

In @tools/scripts/mariadb/start.py:
- Around line 68-77: The code only treats a running container case but ignores
when docker inspect returns exit code 0 and stdout == "false" (a stopped
container with the same name), which will cause docker run --name ... to fail
with a name conflict; update the check in the block that runs subprocess.run
(using docker_executable and args.name and the result variable) to detect
result.returncode == 0 and result.stdout.rstrip("\n") == "false", then log a
clear error like "Container <name> exists but is stopped; remove or rename it
before running" and return a non-zero exit code (or raise/exit) instead of
proceeding; apply the same explicit stopped-container handling to the other
inspect/check sequence around the 91-121 region so both places consistently
detect and fail with a helpful message when a name conflict with a stopped
container exists.

In @tools/scripts/mariadb/stop.py:
- Around line 20-55: The code incorrectly hardcodes "docker" when building
mariadb_stop_cmd and emits a misleading warning; update mariadb_stop_cmd to use
the docker_executable variable instead of the literal "docker", and change the
logger.warning in main (the inspect check) to say the container "is not running
or does not exist" (or similar) since the inspect result covers both cases; keep
the existing return behavior and error handling for subprocess.run as-is.

In @tools/scripts/mariadb/wolf/init_db.py:
- Around line 225-269: The main() function currently lets mariadb.connect and
cursor.execute raise unstructured exceptions; add explicit try/except around the
connect/execute/commit block in main() to catch mariadb.Error (or Exception),
log a clear error message including exception details, and return a non-zero
exit code on failure; also add an optional --host argparse option (default
"127.0.0.1") and use args.host when calling mariadb.connect; refer to main(),
_TABLE_CREATORS, mariadb.connect, cursor.execute, conn.commit to locate the
changes.
🧹 Nitpick comments (3)
tools/scripts/mariadb/start.py (1)

68-73: Add subprocess timeouts to avoid hung CI jobs.

docker pull / docker exec can hang indefinitely; consider timeout=... on subprocess.run(...) (and log the timeout distinctly).

Also applies to: 80-85, 111-116, 124-136

taskfiles/test.yaml (2)

206-227: Consider using {{.ROOT_DIR}} for MariaDB scripts too (defensive against cwd changes).

Even if Task runs internal tasks from a predictable cwd, using absolute-from-root paths here makes the lifecycle more robust.

Proposed diff
-        tools/scripts/mariadb/start.py \
+        {{.ROOT_DIR}}/tools/scripts/mariadb/start.py \
@@
-        tools/scripts/mariadb/wolf/init_db.py \
+        {{.ROOT_DIR}}/tools/scripts/mariadb/wolf/init_db.py \

18-43: Optional: reduce duplication by wrapping “with-mariadb” lifecycle once.

The repeated blocks (env + vars + defer + start-mariadb) across test tasks are easy to drift; consider a single internal task that takes “test command” as a var, or a shared included Taskfile.

Also applies to: 44-68, 77-114, 144-170, 179-205

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f03981e and b21e361.

📒 Files selected for processing (6)
  • taskfiles/lint.yaml
  • taskfiles/test.yaml
  • tools/scripts/get_free_port.py
  • tools/scripts/mariadb/start.py
  • tools/scripts/mariadb/stop.py
  • tools/scripts/mariadb/wolf/init_db.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • taskfiles/lint.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tools/scripts/mariadb/stop.py (1)
tools/scripts/mariadb/start.py (1)
  • main (23-144)
tools/scripts/mariadb/start.py (3)
tools/scripts/get_free_port.py (1)
  • main (21-26)
tools/scripts/mariadb/stop.py (1)
  • main (20-55)
tools/scripts/mariadb/wolf/init_db.py (1)
  • main (225-268)
🪛 Ruff (0.14.10)
tools/scripts/get_free_port.py

24-24: Unused noqa directive (non-enabled: T201)

Remove unused noqa directive

(RUF100)

tools/scripts/mariadb/stop.py

34-34: subprocess call: check for execution of untrusted input

(S603)


50-50: subprocess call: check for execution of untrusted input

(S603)

tools/scripts/mariadb/start.py

68-68: subprocess call: check for execution of untrusted input

(S603)


80-80: subprocess call: check for execution of untrusted input

(S603)


111-111: subprocess call: check for execution of untrusted input

(S603)


124-124: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint
  • GitHub Check: tests (ubuntu-22.04)
  • GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (3)
tools/scripts/get_free_port.py (1)

11-18: Be aware: “free port” selection is TOCTOU and can flake under concurrency.

This can race (port is freed immediately after Line 18). If you see sporadic “address already in use” from docker, consider adding a retry strategy around container start (or move to docker-assigned random host ports and then query the mapped port).

tools/scripts/mariadb/stop.py (1)

34-39: Subprocess S603: looks fine (no shell), but confirm container name input is trusted in your usage.

Since args.name is CLI-provided, confirm your CI/task usage doesn’t pass attacker-controlled values (even though this is a list-form call, not shell).

Also applies to: 50-50

tools/scripts/mariadb/start.py (1)

68-68: Subprocess S603 warnings: likely acceptable (no shell), but confirm this is intended.

If you want to silence these, add targeted # noqa: S603 on the specific calls (or document why this is safe in-repo).

Also applies to: 80-80, 111-111, 124-124

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tools/scripts/mariadb/stop.py (2)

25-31: Consider clarifying the help text.

The help text refers to "the name of the started MariaDB container", which is slightly awkward phrasing for a stop script. Consider simplifying to "the name of the MariaDB container".

♻️ Suggested refinement
     parser.add_argument(
         "--name",
         type=str,
         default="mariadb-spider-dev",
-        help="The name of the started MariaDB container (default: %(default)s)",
+        help="The name of the MariaDB container to stop (default: %(default)s)",
     )

40-42: Improve the warning message accuracy.

The warning message "Container '%s' doesn't exist" is misleading when returncode == 0 but the container is stopped (stdout != "true"). In that case, the container exists but is not running. Consider making the message more accurate to cover both scenarios.

♻️ Suggested refinement
     if result.returncode != 0 or result.stdout.rstrip("\n") != "true":
-        logger.warning("Container '%s' doesn't exist. Exit peacefully.", args.name)
+        logger.warning("Container '%s' is not running or doesn't exist. Nothing to stop.", args.name)
         return 0
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b21e361 and d0b32cc.

📒 Files selected for processing (1)
  • tools/scripts/mariadb/stop.py
🧰 Additional context used
🧬 Code graph analysis (1)
tools/scripts/mariadb/stop.py (2)
tools/scripts/mariadb/wolf/init_db.py (1)
  • main (225-268)
tools/scripts/mariadb/start.py (1)
  • main (23-144)
🪛 Ruff (0.14.10)
tools/scripts/mariadb/stop.py

34-34: subprocess call: check for execution of untrusted input

(S603)


50-50: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (5)
tools/scripts/mariadb/stop.py (5)

1-5: LGTM!

The shebang and script metadata follow the project conventions and are consistent with the companion start.py script.


7-17: LGTM!

The imports and logging configuration are appropriate for this script's functionality.


44-55: LGTM!

The container stopping logic is well-implemented with proper error handling and clear logging.


58-59: LGTM!

The entry point follows standard Python conventions.


34-55: Static analysis warnings are false positives.

The Ruff S603 warnings about subprocess calls are false positives in this context. The docker_executable is hardcoded (line 23), and commands are constructed as lists rather than shell strings. The args.name parameter is expected user input for specifying which container to stop, which is the intended functionality of this script.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
taskfiles/test.yaml (2)

144-169: Potential path resolution issue with get_free_port.py.

This task sets dir: "{{.G_SRC_PYTHON_DIR}}". If sh: commands in vars execute from that directory context, the relative path tools/scripts/get_free_port.py on line 157 would fail. Consider using an absolute path.

Suggested fix using absolute path
     vars:
       MARIADB_CONTAINER_NAME:
         # Normalize UUID casing: macOS generates uppercase while Linux generates lowercase.
         sh: "uuidgen | tr '[:upper:]' '[:lower:]' | sed 's/^/spider-mariadb-/'"
       MARIADB_PORT:
-        sh: "tools/scripts/get_free_port.py"
+        sh: "{{.ROOT_DIR}}/tools/scripts/get_free_port.py"

179-204: Same path resolution issue as spider-py-unit-tests.

The relative path to get_free_port.py on line 192 should also be made absolute for consistency with the suggested fix above.

Suggested fix
       MARIADB_PORT:
-        sh: "tools/scripts/get_free_port.py"
+        sh: "{{.ROOT_DIR}}/tools/scripts/get_free_port.py"
🧹 Nitpick comments (1)
tools/scripts/mariadb/start.py (1)

13-13: Consider pinning the MariaDB image to a specific version.

Using mariadb:latest can lead to non-reproducible test environments. If MariaDB releases an update with breaking changes or different default behaviour, tests could fail unexpectedly.

Suggested change
-_MARIADB_IMAGE = "mariadb:latest"
+_MARIADB_IMAGE = "mariadb:11.4"
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0b32cc and 0678ea1.

📒 Files selected for processing (2)
  • taskfiles/test.yaml
  • tools/scripts/mariadb/start.py
🧰 Additional context used
🪛 Ruff (0.14.10)
tools/scripts/mariadb/start.py

68-68: subprocess call: check for execution of untrusted input

(S603)


80-80: subprocess call: check for execution of untrusted input

(S603)


111-111: subprocess call: check for execution of untrusted input

(S603)


124-124: subprocess call: check for execution of untrusted input

(S603)


153-153: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (ubuntu-22.04)
  • GitHub Check: lint
  • GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (9)
tools/scripts/mariadb/start.py (4)

23-66: LGTM!

The argument parsing is well-structured with sensible defaults and helpful descriptions.


68-89: LGTM!

The container existence check and image pull logic are correct. The subprocess calls use controlled inputs (hardcoded docker executable and CLI-provided arguments), so the S603 warnings are false positives for this dev/test script context.


91-120: LGTM!

The container start command is well-constructed with appropriate flags (--rm for auto-cleanup, -d for detached mode) and proper error handling.


122-159: LGTM!

The readiness polling with timeout and cleanup logic is well-implemented. Using healthcheck.sh --connect --innodb_initialized ensures the database is truly ready for connections before returning success.

taskfiles/test.yaml (5)

7-9: LGTM!

Centralising the MariaDB credentials as global variables is a good practice for maintainability.


18-42: LGTM!

Good implementation with UUID-based container naming and dynamic port allocation, enabling parallel test execution. The defer ensures cleanup even on test failure.


44-68: LGTM!

Consistent pattern with the storage-specific tests. The implementation is correct.


206-227: LGTM!

Good abstraction of the MariaDB lifecycle into an internal task. The script paths work correctly here since there's no dir: directive.


77-113: No issue with script path resolution.

The sh: commands in task variables are evaluated from the repository root in Taskfile v3, regardless of the dir: setting. The cpp-integration task uses dir: "{{.G_BUILD_SPIDER_DIR}}" with relative paths like tools/scripts/get_free_port.py in variable evaluation, and this pattern is consistent across other tasks (spider-py-unit-tests, spider-py-storage-unit-tests) that successfully run in CI. The task executes correctly in the automated test suite.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 128-129: The YAML key is misspelled: change the top-level key
"required:" to "requires:" in the test.yaml block that declares vars (the entry
containing required: vars: ["SPIDER_STORAGE_URL"]) so the task runtime
recognizes the variable requirement; keep the existing vars array unchanged and
ensure indentation remains the same.
- Around line 116-117: Fix the typo where the Taskfile key is written as
required: — update each executor task entry in taskfiles/test.yaml to use the
correct go-task key requires: instead of required:, e.g. change the block that
currently reads required: vars: ["SPIDER_STORAGE_URL"] (and the two other
executor tasks with the same pattern) to requires: vars: ["SPIDER_STORAGE_URL"];
verify and replace all occurrences so variable validation is actually applied.
- Around line 142-143: The YAML key is misspelled as "required:"; update it to
"requires:" so the task definition recognizes the dependency, i.e., replace the
line containing required: vars: ["SPIDER_STORAGE_URL"] with requires: vars:
["SPIDER_STORAGE_URL"] (ensure spacing/indentation matches surrounding task
YAML).
🧹 Nitpick comments (1)
docs/src/dev-docs/testing.md (1)

20-24: LGTM! Clear description of the unified test workflow.

The new section accurately describes the consolidated tests.yaml workflow that runs both unit and integration tests. The description is concise and aligns with the PR's objective of moving to a unified test workflow.

Optional: Add context about workflow transition

If helpful for users familiar with the old workflow structure, consider mentioning that this replaces the previous non-storage-unit-tests workflow (as noted in the PR objectives). However, this is optional since forward-looking documentation may not need historical context.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0678ea1 and 26b8cc6.

📒 Files selected for processing (3)
  • .github/workflows/tests.yaml
  • docs/src/dev-docs/testing.md
  • taskfiles/test.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-03T01:49:27.764Z
Learnt from: sitaowang1998
Repo: y-scope/spider PR: 172
File: build-tasks.yaml:21-45
Timestamp: 2025-08-03T01:49:27.764Z
Learning: In the spider project, external tools like `uv`, Python, and Task are documented as requirements in README.md and are expected to be pre-installed by users rather than bootstrapped by the build system.

Applied to files:

  • .github/workflows/tests.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (ubuntu-22.04)
  • GitHub Check: tests (ubuntu-24.04)
  • GitHub Check: lint
🔇 Additional comments (7)
.github/workflows/tests.yaml (2)

1-1: LGTM! Workflow and job renamed appropriately.

The renaming from "unit-tests"/"non-storage-unit-tests" to "tests" accurately reflects the expanded scope now that storage-backed and integration tests are included.

Also applies to: 20-20


58-62: LGTM! Test steps updated to invoke the new unified tasks.

The workflow now runs all tests (C++ unit tests, spider-py unit tests, and C++ integration tests) through the MariaDB-backed storage executor, aligning with the PR objectives.

taskfiles/test.yaml (3)

8-11: LGTM! MariaDB configuration variables for local testing.

Hardcoded credentials are acceptable for local Docker-based testing. These are only used for ephemeral containers spun up during test runs.


68-105: LGTM! Well-structured storage executor pattern.

The mariadb-storage-executor task properly:

  • Generates unique container names with cross-platform UUID normalization
  • Dynamically allocates a free port to avoid conflicts
  • Uses defer for guaranteed cleanup on success or failure
  • Passes the constructed SPIDER_STORAGE_URL to the test task

14-30: LGTM! Test tasks properly delegate to the storage executor.

All three public test tasks (cpp-unit-tests, cpp-integration, spider-py-unit-tests) correctly invoke mariadb-storage-executor with their respective executor tasks.

docs/src/dev-docs/testing.md (2)

26-26: LGTM! Link reference is correctly formatted.

The markdown reference link to the workflow file is properly structured and points to the expected location.


7-11: Correct the task name from test:spider-py-storage-unit-tests to test:spider-py-unit-tests.

The task in the documentation should match the actual task name in taskfiles/test.yaml, which is spider-py-unit-tests (not spider-py-storage-unit-tests). The description is accurate—it does run spider-py unit tests using a storage backend—but the task identifier is incorrect.

Likely an incorrect or invalid review comment.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a large refactoring on testing tasks. Please check.

uv --version

- name: "Install project dependencies "
- name: "Install project dependencies"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove space.

- run: "task test:spider-py-non-storage-unit-tests"
- run: "task test:spider-py-unit-tests"

- run: "task test:cpp-integration"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group unit tests before integration tests.

Comment on lines 85 to 105
- |-
tools/scripts/mariadb/start.py \
--name "{{.MARIADB_CONTAINER_NAME}}" \
--port "{{.MARIADB_PORT}}" \
--database "{{.G_MARIADB_DATABASE}}" \
--username "{{.G_MARIADB_USERNAME}}" \
--password "{{.G_MARIADB_PASSWORD}}"
- defer: |-
{{.ROOT_DIR}}/tools/scripts/mariadb/stop.py \
--name "{{.MARIADB_CONTAINER_NAME}}"
- |-
tools/scripts/mariadb/wolf/init_db.py \
--port "{{.MARIADB_PORT}}" \
--database "{{.G_MARIADB_DATABASE}}" \
--username "{{.G_MARIADB_USERNAME}}" \
--password "{{.G_MARIADB_PASSWORD}}"
- task: "{{.TEST_TASK}}"
vars:
SPIDER_STORAGE_URL:
"jdbc:mariadb://127.0.0.1:{{.MARIADB_PORT}}/{{.G_MARIADB_DATABASE}}?\
user={{.G_MARIADB_USERNAME}}&password={{.G_MARIADB_PASSWORD}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed start task. In this way, the defer can actually work even if init_db fails.

Comment on lines -9 to -11
cpp-non-storage-unit-tests:
deps:
- "build-unit-test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove xxx-storage-unit-tests and xxx-non-storage-unit-tests.

#
# @param {string} STORAGE_TASK The task to execute. The task must accept no parameters other than
# `SPIDER_STORAGE_URL`, which is set to the MariaDB instance URL.
mariadb-storage-task-executor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a wrapper for storage task exeuction so we don't need to repeat the save set up everywhere.
The storage URL is passed as a variable.

Comment on lines +68 to +71
# A generic wrapper that runs the given task with a MariaDB storage backend.
#
# @param {string} STORAGE_TASK The task to execute. The task must accept no parameters other than
# `SPIDER_STORAGE_URL`, which is set to the MariaDB instance URL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add docstrings for unit tests that contains variables.

# Internal task that runs all Spider Wolf's unit tests.
#
# @param {string} SPIDER_STORAGE_URL An URL pointing to the MariaDB instance.
spider-wolf-unit-tests-executor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the versioning. But since this is an internal task, I'm gonna refer to wolf directly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 115-116: The Taskfile key is misspelled: replace the top-level
"required:" mapping with "requires:" so Taskfile v3 enforces the variable check;
specifically change the block that currently reads `required: vars:
["SPIDER_STORAGE_URL"]` to `requires: vars: ["SPIDER_STORAGE_URL"]` so the
SPIDER_STORAGE_URL check is not silently ignored.
- Around line 141-142: There's a typo in the task spec: the key "required"
should be "requires"; update the task block that currently reads with vars:
["SPIDER_STORAGE_URL"] to use the "requires" key instead of "required" so the
dependency is recognized (i.e., replace the "required:" key with "requires:"
keeping the existing vars list).
- Around line 127-128: In taskfiles/test.yaml the task key uses the wrong
property name "required" for declaring environment variables; replace the
"required" key with the correct "requires" key so the vars mapping
("SPIDER_STORAGE_URL") is recognized (i.e., change the "required: vars: [...]"
entry to "requires: vars: [...]" in the same block).
- Around line 14-30: Calls to mariadb-storage-task-executor set TEST_TASK but
the executor expects STORAGE_TASK, causing missing-variable failures; update the
callers (the cpp-unit-tests, cpp-integration, and spider-py-unit-tests task
entries) to pass STORAGE_TASK instead of TEST_TASK so the variable name matches
mariadb-storage-task-executor, or alternatively change
mariadb-storage-task-executor to accept TEST_TASK (rename expected variable
STORAGE_TASK → TEST_TASK) so both sides use the same identifier.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26b8cc6 and e25e09b.

📒 Files selected for processing (1)
  • taskfiles/test.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (ubuntu-24.04)
  • GitHub Check: tests (ubuntu-22.04)
  • GitHub Check: lint
🔇 Additional comments (3)
taskfiles/test.yaml (3)

8-11: LGTM!

Hardcoded credentials are acceptable here since they're used for ephemeral test containers only.


68-104: Well-structured executor with proper cleanup.

Good use of:

  • UUID-based container naming for parallel test isolation
  • defer to ensure MariaDB container cleanup on failure
  • Cross-platform UUID normalization

144-159: LGTM!

Build dependencies and test execution commands look correct. The venv activation followed by pytest invocation is properly structured.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 14-30: Public task entries set TEST_TASK but the
mariadb-storage-task-executor expects STORAGE_TASK, causing a missing-variable
failure; update the affected task definitions (cpp-unit-tests, cpp-integration,
spider-py-unit-tests) to pass STORAGE_TASK instead of TEST_TASK, or
alternatively adapt mariadb-storage-task-executor to accept TEST_TASK and map it
to STORAGE_TASK, making sure the symbol STORAGE_TASK used in the executor
(references around lines where STORAGE_TASK is read) is provided.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e25e09b and 068f454.

📒 Files selected for processing (1)
  • taskfiles/test.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint
  • GitHub Check: tests (ubuntu-22.04)
  • GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (5)
taskfiles/test.yaml (5)

8-11: LGTM!

Hardcoded credentials for ephemeral local Docker test containers are acceptable. The values are clearly test-specific and self-documenting.


68-104: LGTM!

Well-structured executor with good practices:

  • UUID case normalization handles cross-platform differences.
  • Dynamic port allocation avoids conflicts.
  • Deferred stop command ensures container cleanup even on test failure.
  • JDBC URL construction is correct for MariaDB.

106-118: LGTM!

Clean implementation with PYTHONDONTWRITEBYTECODE to avoid polluting the source tree with bytecode files.


120-132: LGTM!

Correct dependency on build-unit-test and proper environment variable propagation.


134-159: LGTM!

Comprehensive build targets for integration tests and proper venv activation before running pytest.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 100-105: The SPIDER_STORAGE_URL currently embeds raw credentials
which risks leakage and breaks on special chars; update the task vars so the
username/password are URL-encoded before composing SPIDER_STORAGE_URL (use a
URL-encoding routine, e.g., quote/quote_plus) and compose the JDBC string from
the encoded components for the SPIDER_STORAGE_URL variable used by
spider-py-unit-tests-executor, spider-wolf-unit-tests-executor and
spider-wolf-integration-tests-executor; additionally ensure the full JDBC string
is not emitted to test logs (avoid printing it in tests, mark the env as
secret/masked in the CI runner or pass credentials via masked/secret env vars
and/or pass only non-sensitive components to pytest to prevent accidental
logging).
- Around line 75-79: The MARIADB_CONTAINER_NAME sh command uses unix-only tools
(uuidgen, tr, sed) which will fail on Windows; replace the shell pipeline with a
cross-platform Python-based generator and update the MARIADB_CONTAINER_NAME
entry to call it (e.g., point sh to a new or existing script similar to
tools/scripts/get_free_port.py), or invoke python -c to generate a lowercase
UUID prefixed with "spider-mariadb-"; ensure the replacement script/function
exports the same string format so MARIADB_CONTAINER_NAME continues to work
across platforms.

In @tools/scripts/mariadb/start.py:
- Around line 91-109: Add a new CLI option and use it for the MariaDB root
account instead of reusing args.password: introduce an argparse
add_argument("--root-password", default="<distinct-default>", help="password for
MariaDB root user") to populate args.root_password, update the mariadb_start_cmd
list to set "-e", f"MARIADB_ROOT_PASSWORD={args.root_password}" (replacing the
existing args.password usage), and update any docs/help text or tests that
assume a single password.
- Line 13: Replace the floating image tag in _MARIADB_IMAGE (currently
"mariadb:latest") with a pinned tag or digest (e.g., "mariadb:11.4" or a
specific sha256 digest) to ensure CI reproducibility, and update the two log
messages that mention "latest" (the messages at the locations that currently
print "latest") to reflect the new pinned tag or show the value of
_MARIADB_IMAGE so they remain accurate.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 068f454 and 1ae488e.

📒 Files selected for processing (2)
  • taskfiles/test.yaml
  • tools/scripts/mariadb/start.py
🧰 Additional context used
🧬 Code graph analysis (1)
tools/scripts/mariadb/start.py (1)
tools/scripts/mariadb/stop.py (1)
  • main (20-55)
🪛 Ruff (0.14.10)
tools/scripts/mariadb/start.py

68-68: subprocess call: check for execution of untrusted input

(S603)


80-80: subprocess call: check for execution of untrusted input

(S603)


111-111: subprocess call: check for execution of untrusted input

(S603)


124-124: subprocess call: check for execution of untrusted input

(S603)


154-154: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (ubuntu-22.04)
  • GitHub Check: tests (ubuntu-24.04)
  • GitHub Check: lint
🔇 Additional comments (2)
taskfiles/test.yaml (1)

14-31: Nice test isolation via executor wrapper (start/init/run/stop).
The lifecycle ordering (including defer stop) looks clean and should reduce CI duplication.

Also applies to: 68-105

tools/scripts/mariadb/start.py (1)

122-160: The healthcheck.sh script is included in the official MariaDB Docker image and supports the --connect --innodb_initialized flags being used here, so the concern about it potentially missing is unfounded. However, adding docker logs output before cleanup would still improve debugging for cases where the healthcheck fails for other reasons. Consider capturing and logging the container logs on timeout to aid diagnosis of startup failures in CI environments.

Likely an incorrect or invalid review comment.

@sitaowang1998 sitaowang1998 merged commit 6c76b12 into y-scope:main Jan 13, 2026
6 checks passed
@sitaowang1998 sitaowang1998 deleted the storage-test branch January 13, 2026 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants