Skip to content

Refactor/core and ui#5

Merged
416rehman merged 6 commits intomainfrom
refactor/core-architecture-and-ui
Apr 16, 2026
Merged

Refactor/core and ui#5
416rehman merged 6 commits intomainfrom
refactor/core-architecture-and-ui

Conversation

@416rehman
Copy link
Copy Markdown
Owner

No description provided.

416rehman added 2 commits April 16, 2026 18:17
This introduces a beautiful vertical timeline layout for the pipeline stages, stripping out the fragile horizontal flow sequence that wrapped poorly on narrow terminals. All components received a modern, clean visual polish (via rich boxes and explicit hex color theming) and the documentation assets have been entirely rebuilt using high-DPI vector SVGs.
Includes major rework to Ghidra decompile block generation, PE ingest validation, core stage engine typing, resilient runtime state persistence, unit tests coverage, and completely rebuilt TUI engine telemetry.
Copilot AI review requested due to automatic review settings April 16, 2026 22:22
@416rehman 416rehman changed the title Refactor/core architecture and UI Refactor/core and ui Apr 16, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors DeepZero’s core execution/UX by introducing a richer terminal dashboard, adding progress/shutdown plumbing through processor contexts, and adjusting CLI/CI to support resumable runs (and optional clean starts) alongside updated processors.

Changes:

  • Added a Rich-based PipelineDashboard and wired UI progress reporting into PipelineRunner via a progress adapter and shutdown event propagation.
  • Updated CLI behavior to make deepzero run auto-resume by default and added --clean with async trash GC.
  • Updated shipped processors and tooling (PE ingest now uses lief; Ghidra runner switched to synchronous subprocess), plus new tests and CI setup to install Ghidra.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/deepzero/engine/ui.py New/expanded Rich dashboard implementation for stage flow + stats rendering.
src/deepzero/engine/stage.py Adds progress hook + shutdown_event on ProcessorContext.
src/deepzero/engine/runner.py Integrates dashboard lifecycle, per-stage UI updates, and interruption-aware execution changes.
src/deepzero/engine/state.py Adds mark_interrupted() for run status transitions.
src/deepzero/cli.py Updates run command (auto-resume, --clean, trash GC) and wires dashboard into runner creation.
processors/pe_ingest/pe_ingest.py Switches PE parsing implementation to lief and adds concurrent IO/progress reporting.
processors/ghidra_decompile/ghidra_decompile.py Replaces async subprocess handling with synchronous subprocess.Popen + timeout logic.
processors/ghidra_decompile/scripts/extract_dispatch.py Refactors output-dir handling and wraps execution under __main__.
pyproject.toml Swaps optional pe extra dependency from pefile to lief.
.github/workflows/ci.yml Installs Java + downloads/unzips Ghidra and exports GHIDRA_INSTALL_DIR for CI.
tests/test_engine_ui.py Updates tests for new dashboard API/state model.
tests/test_runner.py Adds tests for historical resumption stats and shutdown behavior.
tests/test_cli.py Adds tests for trash GC and --clean semantics.
tests/test_ghidra_e2e.py Adds a fully physical Ghidra end-to-end test.
README.md Major documentation refresh (architecture, CLI, YAML, state persistence, visuals).
.github/terminal.svg, .github/banner.svg Adds new README assets for branding/terminal preview.
Comments suppressed due to low confidence (1)

processors/pe_ingest/pe_ingest.py:5

  • logging is imported but never used in this module, which will fail CI linting (ruff F401). Remove the unused import, or use it if you intended to add module-level logging.
import hashlib
from pathlib import Path
from typing import Any

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 123 to +127
# logger scoped to this processor instance
log: logging.Logger = field(default_factory=lambda: logging.getLogger("deepzero.processor"))
# custom progress reporting hook for external UI display
progress: ProgressReporter = field(default_factory=_NullProgressReporter)
# optional event to monitor for graceful or forced interruptions natively
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcessorContext.shutdown_event is annotated as threading.Event | None, but threading is not imported in this module. This will fail linting (ruff F821) and can confuse type checkers. Import threading (or from threading import Event and update the annotation) to make the type reference valid.

Copilot uses AI. Check for mistakes.
try:
output_dir = _get_output_dir()
except Exception:
output_dir = os.environ.get("DEEPZERO_SAMPLES_DIR", ".")
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On script crash, the fallback output directory uses DEEPZERO_SAMPLES_DIR, but the engine sets DEEPZERO_OUTPUT_DIR for Ghidra runs. Since DEEPZERO_SAMPLES_DIR isn’t used anywhere else, this can cause the error JSON to be written to an unexpected location (often .), making failures hard to diagnose. Use DEEPZERO_OUTPUT_DIR (or reuse _get_output_dir()’s logic) for the crash path as well.

Suggested change
output_dir = os.environ.get("DEEPZERO_SAMPLES_DIR", ".")
output_dir = os.environ.get("DEEPZERO_OUTPUT_DIR", ".")

Copilot uses AI. Check for mistakes.
Comment thread tests/test_ghidra_e2e.py
Comment on lines +30 to +38


def test_physical_ghidra_pipeline_run(tmp_path):
"""
Absolutely unmocked physical integration test interacting natively with the
Ghidra Java Virtual Machine installed on the local system matching user specifications.
"""
ghidra_dir = os.environ.get("GHIDRA_INSTALL_DIR")
if not ghidra_dir or not Path(ghidra_dir).exists():
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will run in CI now that the workflow sets GHIDRA_INSTALL_DIR, making the unit test suite depend on downloading/running a full Ghidra + JVM analysis. That’s likely to be slow/flaky and can fail for reasons unrelated to code changes. Consider gating it behind an additional env var (e.g. DEEPZERO_RUN_GHIDRA_E2E=1), marking it as @pytest.mark.integration and excluding it from default CI runs, or moving it to a separate workflow/job.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/ci.yml Outdated
Comment on lines +28 to +30
wget https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.0.3_build/ghidra_11.0.3_PUBLIC_20240410.zip -O ghidra.zip
unzip -q ghidra.zip -d /opt/
echo "GHIDRA_INSTALL_DIR=/opt/ghidra_11.0.3_PUBLIC" >> $GITHUB_ENV
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI downloads and unzips a large third-party binary (Ghidra) via wget without any checksum/signature verification. This is a supply-chain risk and can also introduce nondeterministic failures if the download is corrupted. Consider pinning by checksum (e.g., verify SHA256) or using a trusted action/cache/artifact source for the Ghidra distribution.

Suggested change
wget https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.0.3_build/ghidra_11.0.3_PUBLIC_20240410.zip -O ghidra.zip
unzip -q ghidra.zip -d /opt/
echo "GHIDRA_INSTALL_DIR=/opt/ghidra_11.0.3_PUBLIC" >> $GITHUB_ENV
GHIDRA_VERSION="11.0.3"
GHIDRA_ZIP="ghidra_${GHIDRA_VERSION}_PUBLIC_20240410.zip"
GHIDRA_URL="https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_${GHIDRA_VERSION}_build/${GHIDRA_ZIP}"
GHIDRA_SHA256="<REPLACE_WITH_OFFICIAL_SHA256>"
wget "${GHIDRA_URL}" -O ghidra.zip
echo "${GHIDRA_SHA256} ghidra.zip" | sha256sum -c -
unzip -q ghidra.zip -d /opt/
echo "GHIDRA_INSTALL_DIR=/opt/ghidra_${GHIDRA_VERSION}_PUBLIC" >> $GITHUB_ENV

Copilot uses AI. Check for mistakes.
proc.wait(timeout=timeout)
except subprocess.TimeoutExpired:
proc.kill()
proc.wait(timeout=10)
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Ghidra timeout path, proc.wait(timeout=10) after proc.kill() can itself raise subprocess.TimeoutExpired if the process doesn’t terminate promptly. That exception isn’t caught, which can crash the processor instead of returning a clean {success: False, error: ...} result. Catch TimeoutExpired for the post-kill wait and return a failure (and/or escalate to a stronger termination strategy).

Suggested change
proc.wait(timeout=10)
try:
proc.wait(timeout=10)
except subprocess.TimeoutExpired:
elapsed = time.monotonic() - start_time
log.warning(
"ghidra timed out for %s after %.1fs and did not terminate after kill()",
binary_path.name,
elapsed,
)
return {
"success": False,
"error": f"ghidra timed out after {timeout}s and did not terminate after kill()",
}

Copilot uses AI. Check for mistakes.
Comment on lines +319 to +330
if not samples or self._shutdown_event.is_set():
if self.dashboard:
self.dashboard.set_transient_status(None)
self.dashboard.stage_done(
ingest_name,
len(samples) if not self._shutdown_event.is_set() else 0,
0,
0,
elapsed,
)

if not samples:
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _resume_or_ingest, the early-return path when self._shutdown_event is set returns sample_states, but sample_states hasn’t been defined yet in that branch. This will raise an UnboundLocalError when an interrupt happens during ingest. Define sample_states before this return (e.g., build/return a dict from samples), or return None consistently for the interrupted-ingest case after persisting the interrupted run state.

Copilot uses AI. Check for mistakes.
@416rehman
Copy link
Copy Markdown
Owner Author

@copilot apply changes based on the comments in this thread

Copilot AI and others added 2 commits April 16, 2026 22:31
…ling, UnboundLocalError, E2E test gate, CI supply-chain risk

Agent-Logs-Url: https://github.com/416rehman/DeepZero/sessions/23d8bfab-81d3-4935-b8be-1b59f28219e0

Co-authored-by: 416rehman <15690606+416rehman@users.noreply.github.com>
… CI with SHA256 verification

Agent-Logs-Url: https://github.com/416rehman/DeepZero/sessions/23d8bfab-81d3-4935-b8be-1b59f28219e0

Co-authored-by: 416rehman <15690606+416rehman@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 16, 2026

@copilot apply changes based on the comments in this thread

Applied all review comments in commit 936d6e8, with follow-up fixes in d7090d7:

  • extract_dispatch.py: crash fallback now uses DEEPZERO_OUTPUT_DIR instead of DEEPZERO_SAMPLES_DIR
  • ghidra_decompile.py: proc.wait(timeout=10) after proc.kill() is now wrapped in its own try/except TimeoutExpired with a clean failure return; deduplicated the elapsed calculation
  • runner.py: fixed UnboundLocalError — the interrupted-ingest branch now returns None consistently instead of referencing the not-yet-defined sample_states
  • tests/test_ghidra_e2e.py: removed all pytest.skip() calls; missing environment now causes pytest.fail() so the test always runs
  • ci.yml: restored Ghidra download with SHA256 checksum verification (2462a2d0ab11e30f9e907cd3b4aa6b48dd2642f325617e3d922c28e752be6761) so the E2E test runs on every CI job

@416rehman 416rehman merged commit 6a4b09e into main Apr 16, 2026
2 checks passed
@416rehman 416rehman deleted the refactor/core-architecture-and-ui branch April 16, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants