Conversation
This introduces a beautiful vertical timeline layout for the pipeline stages, stripping out the fragile horizontal flow sequence that wrapped poorly on narrow terminals. All components received a modern, clean visual polish (via rich boxes and explicit hex color theming) and the documentation assets have been entirely rebuilt using high-DPI vector SVGs.
Includes major rework to Ghidra decompile block generation, PE ingest validation, core stage engine typing, resilient runtime state persistence, unit tests coverage, and completely rebuilt TUI engine telemetry.
There was a problem hiding this comment.
Pull request overview
Refactors DeepZero’s core execution/UX by introducing a richer terminal dashboard, adding progress/shutdown plumbing through processor contexts, and adjusting CLI/CI to support resumable runs (and optional clean starts) alongside updated processors.
Changes:
- Added a Rich-based
PipelineDashboardand wired UI progress reporting intoPipelineRunnervia a progress adapter and shutdown event propagation. - Updated CLI behavior to make
deepzero runauto-resume by default and added--cleanwith async trash GC. - Updated shipped processors and tooling (PE ingest now uses
lief; Ghidra runner switched to synchronoussubprocess), plus new tests and CI setup to install Ghidra.
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/deepzero/engine/ui.py |
New/expanded Rich dashboard implementation for stage flow + stats rendering. |
src/deepzero/engine/stage.py |
Adds progress hook + shutdown_event on ProcessorContext. |
src/deepzero/engine/runner.py |
Integrates dashboard lifecycle, per-stage UI updates, and interruption-aware execution changes. |
src/deepzero/engine/state.py |
Adds mark_interrupted() for run status transitions. |
src/deepzero/cli.py |
Updates run command (auto-resume, --clean, trash GC) and wires dashboard into runner creation. |
processors/pe_ingest/pe_ingest.py |
Switches PE parsing implementation to lief and adds concurrent IO/progress reporting. |
processors/ghidra_decompile/ghidra_decompile.py |
Replaces async subprocess handling with synchronous subprocess.Popen + timeout logic. |
processors/ghidra_decompile/scripts/extract_dispatch.py |
Refactors output-dir handling and wraps execution under __main__. |
pyproject.toml |
Swaps optional pe extra dependency from pefile to lief. |
.github/workflows/ci.yml |
Installs Java + downloads/unzips Ghidra and exports GHIDRA_INSTALL_DIR for CI. |
tests/test_engine_ui.py |
Updates tests for new dashboard API/state model. |
tests/test_runner.py |
Adds tests for historical resumption stats and shutdown behavior. |
tests/test_cli.py |
Adds tests for trash GC and --clean semantics. |
tests/test_ghidra_e2e.py |
Adds a fully physical Ghidra end-to-end test. |
README.md |
Major documentation refresh (architecture, CLI, YAML, state persistence, visuals). |
.github/terminal.svg, .github/banner.svg |
Adds new README assets for branding/terminal preview. |
Comments suppressed due to low confidence (1)
processors/pe_ingest/pe_ingest.py:5
loggingis imported but never used in this module, which will fail CI linting (ruff F401). Remove the unused import, or use it if you intended to add module-level logging.
import hashlib
from pathlib import Path
from typing import Any
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # logger scoped to this processor instance | ||
| log: logging.Logger = field(default_factory=lambda: logging.getLogger("deepzero.processor")) | ||
| # custom progress reporting hook for external UI display | ||
| progress: ProgressReporter = field(default_factory=_NullProgressReporter) | ||
| # optional event to monitor for graceful or forced interruptions natively |
There was a problem hiding this comment.
ProcessorContext.shutdown_event is annotated as threading.Event | None, but threading is not imported in this module. This will fail linting (ruff F821) and can confuse type checkers. Import threading (or from threading import Event and update the annotation) to make the type reference valid.
| try: | ||
| output_dir = _get_output_dir() | ||
| except Exception: | ||
| output_dir = os.environ.get("DEEPZERO_SAMPLES_DIR", ".") |
There was a problem hiding this comment.
On script crash, the fallback output directory uses DEEPZERO_SAMPLES_DIR, but the engine sets DEEPZERO_OUTPUT_DIR for Ghidra runs. Since DEEPZERO_SAMPLES_DIR isn’t used anywhere else, this can cause the error JSON to be written to an unexpected location (often .), making failures hard to diagnose. Use DEEPZERO_OUTPUT_DIR (or reuse _get_output_dir()’s logic) for the crash path as well.
| output_dir = os.environ.get("DEEPZERO_SAMPLES_DIR", ".") | |
| output_dir = os.environ.get("DEEPZERO_OUTPUT_DIR", ".") |
|
|
||
|
|
||
| def test_physical_ghidra_pipeline_run(tmp_path): | ||
| """ | ||
| Absolutely unmocked physical integration test interacting natively with the | ||
| Ghidra Java Virtual Machine installed on the local system matching user specifications. | ||
| """ | ||
| ghidra_dir = os.environ.get("GHIDRA_INSTALL_DIR") | ||
| if not ghidra_dir or not Path(ghidra_dir).exists(): |
There was a problem hiding this comment.
This test will run in CI now that the workflow sets GHIDRA_INSTALL_DIR, making the unit test suite depend on downloading/running a full Ghidra + JVM analysis. That’s likely to be slow/flaky and can fail for reasons unrelated to code changes. Consider gating it behind an additional env var (e.g. DEEPZERO_RUN_GHIDRA_E2E=1), marking it as @pytest.mark.integration and excluding it from default CI runs, or moving it to a separate workflow/job.
| wget https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.0.3_build/ghidra_11.0.3_PUBLIC_20240410.zip -O ghidra.zip | ||
| unzip -q ghidra.zip -d /opt/ | ||
| echo "GHIDRA_INSTALL_DIR=/opt/ghidra_11.0.3_PUBLIC" >> $GITHUB_ENV |
There was a problem hiding this comment.
CI downloads and unzips a large third-party binary (Ghidra) via wget without any checksum/signature verification. This is a supply-chain risk and can also introduce nondeterministic failures if the download is corrupted. Consider pinning by checksum (e.g., verify SHA256) or using a trusted action/cache/artifact source for the Ghidra distribution.
| wget https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.0.3_build/ghidra_11.0.3_PUBLIC_20240410.zip -O ghidra.zip | |
| unzip -q ghidra.zip -d /opt/ | |
| echo "GHIDRA_INSTALL_DIR=/opt/ghidra_11.0.3_PUBLIC" >> $GITHUB_ENV | |
| GHIDRA_VERSION="11.0.3" | |
| GHIDRA_ZIP="ghidra_${GHIDRA_VERSION}_PUBLIC_20240410.zip" | |
| GHIDRA_URL="https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_${GHIDRA_VERSION}_build/${GHIDRA_ZIP}" | |
| GHIDRA_SHA256="<REPLACE_WITH_OFFICIAL_SHA256>" | |
| wget "${GHIDRA_URL}" -O ghidra.zip | |
| echo "${GHIDRA_SHA256} ghidra.zip" | sha256sum -c - | |
| unzip -q ghidra.zip -d /opt/ | |
| echo "GHIDRA_INSTALL_DIR=/opt/ghidra_${GHIDRA_VERSION}_PUBLIC" >> $GITHUB_ENV |
| proc.wait(timeout=timeout) | ||
| except subprocess.TimeoutExpired: | ||
| proc.kill() | ||
| proc.wait(timeout=10) |
There was a problem hiding this comment.
In the Ghidra timeout path, proc.wait(timeout=10) after proc.kill() can itself raise subprocess.TimeoutExpired if the process doesn’t terminate promptly. That exception isn’t caught, which can crash the processor instead of returning a clean {success: False, error: ...} result. Catch TimeoutExpired for the post-kill wait and return a failure (and/or escalate to a stronger termination strategy).
| proc.wait(timeout=10) | |
| try: | |
| proc.wait(timeout=10) | |
| except subprocess.TimeoutExpired: | |
| elapsed = time.monotonic() - start_time | |
| log.warning( | |
| "ghidra timed out for %s after %.1fs and did not terminate after kill()", | |
| binary_path.name, | |
| elapsed, | |
| ) | |
| return { | |
| "success": False, | |
| "error": f"ghidra timed out after {timeout}s and did not terminate after kill()", | |
| } |
| if not samples or self._shutdown_event.is_set(): | ||
| if self.dashboard: | ||
| self.dashboard.set_transient_status(None) | ||
| self.dashboard.stage_done( | ||
| ingest_name, | ||
| len(samples) if not self._shutdown_event.is_set() else 0, | ||
| 0, | ||
| 0, | ||
| elapsed, | ||
| ) | ||
|
|
||
| if not samples: |
There was a problem hiding this comment.
In _resume_or_ingest, the early-return path when self._shutdown_event is set returns sample_states, but sample_states hasn’t been defined yet in that branch. This will raise an UnboundLocalError when an interrupt happens during ingest. Define sample_states before this return (e.g., build/return a dict from samples), or return None consistently for the interrupted-ingest case after persisting the interrupted run state.
|
@copilot apply changes based on the comments in this thread |
…ling, UnboundLocalError, E2E test gate, CI supply-chain risk Agent-Logs-Url: https://github.com/416rehman/DeepZero/sessions/23d8bfab-81d3-4935-b8be-1b59f28219e0 Co-authored-by: 416rehman <15690606+416rehman@users.noreply.github.com>
… CI with SHA256 verification Agent-Logs-Url: https://github.com/416rehman/DeepZero/sessions/23d8bfab-81d3-4935-b8be-1b59f28219e0 Co-authored-by: 416rehman <15690606+416rehman@users.noreply.github.com>
Applied all review comments in commit 936d6e8, with follow-up fixes in d7090d7:
|
No description provided.