Choice: Build on top of the pytm library rather than writing STRIDE rules from scratch.
Why it matters: pytm provides a mature, tested rule set for STRIDE applied to Actor/Server/Dataflow topologies. It handles threat enumeration automatically once the model is built. This means the project gets rule coverage for free and can focus on enrichment (MITRE mapping, AI, reporting).
Trade-off accepted: pytm's data model is rigid (Boundary, Actor, Server, Dataflow, Data) and its
threat targeting (class-level, not instance-level) required _expand_class_targets() to fix.
The Boundary class also lacks several attributes needed by the project, requiring runtime
monkey-patching.
Choice: Users define their threat models in a custom Markdown format (## Actors, ## Servers,
## Dataflows, etc.) rather than writing Python code against the pytm API.
Why it matters: Makes threat models accessible to security architects who are not Python
developers. The DSL files are human-readable, version-controllable, and can be edited in the
Monaco editor in the web UI. Templates in threatModel_Template/ lower the barrier to entry.
Trade-off accepted: The ModelParser is custom-built and fragile — section headers are
hard-coded strings, and the 2-pass design adds complexity. Breaking the DSL grammar breaks all
existing model files.
Choice: All LLM calls go through LiteLLM (litellm.acompletion) rather than vendor-specific
SDKs (e.g., openai, google-generativeai).
Why it matters: A single provider can be swapped by changing one line in ai_config.yaml
(enabled: true/false). Supports Gemini, OpenAI, Mistral, and Ollama with a unified interface.
Enterprise users can point api_base at an internal proxy (OpenAI-compatible) without code changes.
What was rejected: Direct SDK calls per provider would require separate code paths and would not support the "first enabled provider wins" selection logic cleanly.
Choice: Retrieval-Augmented Generation using a local ChromaDB vector store pre-built from
the external_data/ knowledge base (CAPEC, CVE, ATT&CK, D3FEND, CIS, NIST).
Why it matters: Enables the LLM to ground its threat generation in real security knowledge
(CVE precedents, CAPEC attack patterns) without fine-tuning. The vector store is built once
offline (tooling/build_vector_store.py) and ships with the project.
What was rejected: Fine-tuning (Phase 3 in the roadmap) was explicitly deferred — too costly and less flexible than RAG for a knowledge base that updates frequently.
Choice: Local sentence-transformer model run on CPU as the default for vector similarity search.
Why it matters: Zero network dependency and zero API cost for embeddings. Supports fully air-gapped/sovereign deployments. The model is small (80 MB) and fast enough for the query volumes involved (single-user tool, not a SaaS).
What was rejected: Cloud embedding APIs (Google, OpenAI) are supported via embedding_factory.py
but not the default, to avoid mandatory API key setup for the RAG feature.
Choice: Flask with Flask[async] extension.
Why it matters: The project started without AI streaming and Flask was the natural choice for
a simple single-user tool. Flask[async] was added to support SSE streaming of AI generation
output without a full framework migration.
Trade-off accepted: The sync/async boundary is awkward — generate_markdown_from_prompt_sync
uses loop.run_until_complete(gen.__anext__()) per chunk, which is not safe under concurrent load.
FastAPI would handle this more cleanly but would require significant route refactoring.
Choice: Use a global queue.Queue (ai_status_event_queue) and SSE endpoint
(/api/ai_status_stream) to push real-time AI processing progress to the browser.
Why it matters: AI enrichment (per-component LLM calls) can take 30–120 seconds. Without live feedback the UI appears frozen. SSE is simpler than WebSockets for unidirectional server→client streams and requires no additional dependencies.
What was rejected: Polling (too many requests) and WebSockets (overkill for unidirectional flow).
Choice: Heavy imports (litellm, langchain, chromadb, sentence_transformers) are done
inside methods or __init__, not at module top level.
Why it matters: The Flask server starts fast even when AI features are disabled or dependencies
are not installed. ThreatModelService is also lazy-initialized on first request.
Consequence: Import errors for optional AI deps surface at runtime, not at import time. This is intentional — the tool works without AI deps if AI config is disabled.
Choice: AI-generated and custom threats are represented as ExtendedThreat(pytm.Threat) with
an added source attribute ("pytm", "AI", "LLM").
Why it matters: Allows the rest of the pipeline (grouping, MITRE mapping, reporting) to treat
all threats uniformly regardless of origin. The source field enables the report to distinguish
traditional vs. AI-generated threats and apply different rendering.
What was rejected: A parallel AIThreat class with no pytm inheritance would have required
forking all downstream threat processing logic.
Choice: Scripts that download, parse, and transform security data (MITRE ATT&CK JSON, CAPEC XML, CVE JSONL, NIST XLSX) are separate one-shot tools, not part of the runtime application.
Why it matters: The transformed artifacts (external_data/*.json, vector_store/) are committed
to the repo so end users do not need network access or tool dependencies to use the threat model.
The pipeline only needs to run when upstream data sources update.
Trade-off accepted: The repo is large (CVE JSONL files from 1999–2025 are significant). The vector store binary files are also committed, which inflates repo size.
Choice: All AI provider selection, model parameters, RAG settings, and embedding config live
in one YAML file rather than environment variables or a .env file.
Why it matters: A single file to edit for setup. Enterprise users can template this file
for different deployment environments. The api_key_env indirection keeps secrets in environment
variables while keeping non-sensitive config in the file.
Consequence: The YAML file must be present at runtime even if AI is disabled. Path is resolved
relative to PROJECT_ROOT (detected via Path(__file__).resolve().parents[N]).
Choice: When a pytm threat and an AI/LLM threat cover the same (target, stride_category) with
similar descriptions (Jaccard word-overlap ≥ 0.3 or substring containment), the AI version replaces
the pytm version. Implemented in threat_analysis/core/threat_consolidator.py.
Why it matters: Without deduplication, the same risk (e.g. "SQL injection on DB server") appears two or three times in the report — once from pytm rules, once from the component LLM, possibly again from the RAG pipeline. This inflates threat counts and confuses prioritisation.
Why Jaccard and not embedding similarity: Embedding similarity would require loading the sentence-transformer model during report generation, breaking the offline-without-AI-deps path. Jaccard on word sets is purely offline, instantaneous, and sufficient for the overlap patterns observed in practice (same STRIDE category, same target, paraphrased description).
Why AI wins over pytm: AI-generated threats include richer context (attack scenario, recommended controls). Keeping the AI version provides more actionable output. The pytm threat is only removed when similarity is confirmed — unrelated pytm threats are always preserved.
Choice: Every JSON export is serialised by ReportSerializer and stamped schema_version: "1.0".
The schema is validated against threat_analysis/schemas/v1/threat_model_report.schema.json
(JSON Schema 2020-12). Threats get stable IDs in the format T-0001.
Why it matters: SIEM integrations, dashboards, and CI gates that consume the JSON output break silently when the structure changes. A versioned, validated schema gives consumers a stable contract and allows the tool to signal breaking changes explicitly.
What was rejected: Ad-hoc dict construction (the previous approach) offered no contract and generated different key sets depending on which code path ran. Migration to a serialiser was preferred over adding version guards to the ad-hoc dict.
Choice: Instead of a CVSS score (absent from the CVE JSONL files), the RiskContext dataclass
encodes four binary context signals that adjust the STRIDE base score:
- CVE match for target/category → +0.5 (confirmed exploitability evidence)
- High-risk CWE class (injection, memory corruption, hardcoded creds…) → +0.3
- Network-exposed without auth/encryption → +0.7
- D3FEND defensive mitigations in place → −0.5
Why CWE instead of CVSS: Inspection of the CVE JSONL corpus revealed that entries only carry
CWE, CAPEC, and TECHNIQUES keys — no CVSS scores. CWE class is used as an exploitability
proxy: the _HIGH_RISK_CWES frozenset covers the 14 most commonly weaponised weakness classes.
Why additive deltas instead of a new scoring model: The existing base-score system (by STRIDE
category, target multiplier, protocol, classification) already has calibrated values. Additive
deltas from RiskContext layer on top without invalidating historical scores or requiring
re-calibration of the full model.
Choice: CVEService._ensure_maps_loaded() populates both _cve_to_capec_map and
_cve_to_cwe_map in a single traversal of all JSONL files (~26 files, 1999–2025).
Why it matters: The JSONL corpus is large. A separate pass to build the CWE map (naively
added alongside the existing CAPEC pass) would double I/O and parse time at startup. A single
lazy-loaded pass, gated by if self._cve_to_capec_map is not None: return, ensures both maps
are always consistent and the files are opened exactly once.
Choice: A secopstm command is installed by pip install -e . via
[project.scripts] secopstm = "threat_analysis.__main__:main". New flags added:
--output-format {all,html,json,stix}, --output-file PATH, --stdout.
Why it matters: CI pipelines need a stable, pip-installable command — not python -m threat_analysis with an implicit working directory. --stdout enables direct piping to jq
or a SIEM without writing intermediate files. All legacy invocations (python -m threat_analysis --server, --gui, --project, --ansible-path) remain 100% backward-compatible because new
flags all default to the previous behaviour.
Choice: threat_analysis/core/attack_chain.py uses simple directed graph traversal (dataflows
as edges, threats indexed by component name) to identify multi-step attack paths. No NLP, no
ML, no network access.
Why it matters: A common question in threat review is "how do these threats chain together
into a realistic attack scenario?" The attack chain section gives reviewers a pre-computed answer
without additional tooling. It reuses the already-computed all_threats list — no extra AI call,
no extra cost.
Why not ML/graph DB: The system topology is small (typically <50 nodes). Simple iteration over dataflows is O(E × max_threats_per_node). Graph DB (Neo4j) or ML clustering would add heavyweight dependencies for zero additional accuracy at this scale.
Trade-off accepted: Chains are identified by dataflow adjacency only — threats within the same boundary but not connected by a modelled dataflow are not chained. This is intentional: if no dataflow is modelled, no data movement is assumed.
Choice: Trusted boundaries use color="#2e7d32" (dark green, solid), untrusted use
color="#c62828" (dark red, dashed). These values are in threat_model.dot.j2 — they affect
the DOT output directly, which means both the Graphviz-generated SVG export and the HTML diagram
render consistent colors.
Why baked into DOT and not post-processed in SVG: The user generates SVG files independently
via DiagramGenerator.generate_custom_svg_export(). Any coloring applied only at the HTML stage
would be absent from the exported SVG. Putting the color logic in the DOT template is the only
way to guarantee consistency across all output formats.
What was rejected: A JavaScript overlay that colorizes boundaries in the HTML view only was considered but rejected for the trust convention specifically (B1). It was however adopted for the severity heat map (B2) because severity data requires threat processing, which is not available during SVG generation.
Choice: The severity overlay in HTML diagrams is implemented entirely in JavaScript — it reads
severity_map_json injected at template render time and toggles SVG fill attributes on demand.
The base SVG is never modified. The toggle button hides itself when severity_map is empty.
Why JS-only: Severity requires the full threat scoring pipeline to run (process_threats() →
_compute_severity_map()). In the live editor preview, this pipeline does not run (too slow for
real-time editing). A JS toggle that receives pre-computed data at render time is the only approach
that works in both the live preview (empty map → button hidden) and the exported HTML (data
present → button shown).
Why preserve original fills: The overlay must be reversible so users can compare the
architectural diagram (original colors) against the severity view. The originalNodeFills map
saves each node's fill before the overlay is applied and restores it on toggle-off.
Choice: AIService.generate_rag_threats_sync() uses asyncio.run_coroutine_threadsafe() on
a persistent background event loop (_get_sync_loop()) rather than asyncio.run() or a new
ThreadPoolExecutor. It is called from ReportGenerator.generate_project_reports() after all
sub-models are processed and main_threat_model.sub_models is populated.
Why a persistent loop instead of asyncio.run(): generate_project_reports() is called from
a Flask route handler which already runs inside an event loop. asyncio.run() would raise
RuntimeError: cannot run nested event loop. The persistent background loop pattern (already used
by generate_markdown_from_prompt_sync) avoids this issue cleanly.
Why after sub-model recursion: The RAG query needs the full project markdown to generate
cross-boundary threats. The sub-model list is only complete after the recursive pass over
all_processed_models. Calling RAG before that would give it only the main model's context,
defeating the purpose of cross-model analysis.
Choice: SecOpsBoundary and Boundary objects are added to all_elements in
AIService._enrich_with_ai_threats(). Because boundaries lack is_public and is_authenticated
attributes (those belong to Servers/Actors), boundary-specific logic was added: elem_type is set
to "Trust Boundary (Trusted|Untrusted)" and the trust level is injected into the prompt.
Why boundaries merit their own AI threats: Boundaries represent trust zone transitions — exactly the places where privilege escalation, lateral movement, and data exfiltration paths are most likely. pytm does not generate threats targeting boundaries directly (only the dataflows crossing them). Adding boundaries as AI targets surfaces zone-level threats that the rule engine misses.
What was rejected: Generating boundary threats as a separate post-processing step (e.g. from the ATT&CK mapping alone) was considered but would not leverage the contextual knowledge of the LLM about the specific architecture described in the model.
Choice: Implement GDAFEngine as a separate, complementary attack path generator that works
top-down from attacker objectives, rather than extending AttackChainAnalyzer (which works
bottom-up from discovered threats).
Why it matters: The two engines answer different questions:
AttackChainAnalyzeranswers: "Given the threats already found, which ones can be chained across dataflows?" It starts from the threat inventory and finds adjacencies.GDAFEngineanswers: "If an adversary with these capabilities wanted to reach this target, what path through the architecture would they take?" It starts from intent and finds paths.
The bottom-up approach surfaces unexpected chaining of individually low-risk threats. The top-down approach guarantees that high-value objectives (domain compromise, data exfiltration) are always evaluated, even if no pytm or AI threats have been generated yet — which matters for architectures described in the DSL but not yet enriched by AI.
Why not replace AttackChainAnalyzer: The two outputs are structurally different.
AttackChainAnalyzer chains existing ExtendedThreat objects with their STRIDE categories,
MITRE mappings, and severity scores — directly linkable to the threat report. GDAFEngine
produces new AttackScenario objects with per-hop MITRE technique assignments from a graph
traversal — suited for Attack Flow export and adversary simulation, not for augmenting the
existing threat table.
Trade-off accepted: GDAF requires a context YAML file with objectives and actor profiles.
Without this file, GDAFEngine.run() returns an empty list. This is intentional: the engine
is opt-in and the context file is the mechanism for expressing organizational threat intelligence
that cannot be inferred from the architecture model alone.
Choice: Les protocoles exposés par un asset sont inférés depuis les dataflows adjacents (entrants + sortants) plutôt qu'un champ services=[] explicite sur le serveur.
Why it matters: Le DSL modélise déjà les protocoles sur chaque dataflow. Exiger une déclaration redondante services=[SSH, RDP] sur le serveur violerait DRY et introduirait des incohérences (services déclarés mais aucun dataflow correspondant). Le collecteur de _build_graph() fait un agrégat automatique au moment de la construction du graphe.
Trade-off accepted: Les services locaux sans dataflow modélisé (ex. un port ouvert mais jamais utilisé dans le modèle) ne sont pas visibles. Cette limitation est intentionnelle : si un flux n'est pas modélisé, aucun chemin d'attaque ne l'emprunte de toute façon.
Choice: Les données d'inventaire (version OS, CVEs, patch level, detection level) sont stockées dans des fichiers YAML séparés par asset (BOM/{asset_name}.yaml), pas dans le DSL.
Why it matters: Le DSL décrit l'architecture logique (composants, flux, confiance). L'inventaire technique (versions, CVEs) est une donnée opérationnelle qui change fréquemment, doit pouvoir être générée depuis des outils tiers (Qualys, Tenable, CMDB), et ne doit pas polluer la lisibilité du modèle d'architecture.
Why YAML and not inline DSL: Un export de scanner de vulnérabilités peut directement alimenter le répertoire BOM/. Le format YAML est compatible avec les exports Ansible inventory, Tenable, Qualys. L'intégration directe dans le DSL nécessiterait une migration de tous les modèles existants.
Auto-discovery: BOMLoader cherche {model_parent}/BOM/ automatiquement si bom_directory n'est pas déclaré dans ## Context. Pas de configuration requise pour commencer.
Choice: Un attribut traversal_difficulty=low|medium|high sur les boundaries contrôle un bonus de hop_weight : low → +0.3 (zone facile à traverser = chemin plus exploitable), medium → +0.1, high → +0.0 (attaquant très capable requis = chemin moins probable mais toujours évalué).
Why it matters: Le trust binaire (isTrusted=true/false) ne distingue pas une DMZ d'un réseau OT airgappé. Un attaquant peut traverser les deux, mais la probabilité et les techniques nécessaires sont radicalement différentes. Le bonus de hop_weight reflète l'accessibilité relative.
Why NOT reduce score for high difficulty: Un chemin traversant un réseau OT (traversal_difficulty=high) doit toujours apparaître dans les résultats — c'est précisément ce chemin qu'un analyste doit évaluer. La difficulté réduit légèrement son score relatif mais ne l'élimine pas.
Choice: A server in a threat model can reference a child model via submodel=./path/to/model.md.
The child is not a separate dependency or a called service — it IS the same component described
at a finer granularity. The parent diagram links to the child; the child diagram shows ghost
nodes for external connections from the parent.
Why it matters: Large architectures become unreadable when everything is in one flat diagram. Drill-down allows a threat model to be decomposed: the top-level model shows the overall architecture with summarized components, and each component that merits deeper analysis gets its own model file with the same full DSL support (actors, servers, dataflows, STRIDE analysis, AI enrichment).
Why ghost nodes in the child diagram: When a security reviewer drills into a sub-component, they need to know what external systems communicate with it — this context is essential for identifying trust boundary threats and data flow risks. Ghost nodes provide that context without requiring the child model to re-declare all parent-level connections, which would create maintenance duplication.
Why bridging edges in GDAF: An attacker who compromises a server does not stop at that node — they move into the server's internals. Without bridging edges, GDAF would treat the parent server as a terminal node and miss all internal paths. Bridging reuses the already-parsed sub-model graph, adding only two passes of edge injection (entry + exit bridges), at essentially zero extra cost.
What was rejected: A separate "project dependencies" section in the DSL (defining which
models communicate with each other) was considered but would require maintaining two separate
representations of the same relationship — the server definition and a separate link declaration.
The submodel= inline keyword keeps the link co-located with the server definition.
Choice: AIService.init_ai() starts RAGThreatGenerator() in loop.run_in_executor(None, ...)
immediately, before calling provider.check_connection(). The asyncio Future is awaited only
after the connection check completes.
Why it matters: The two operations are completely independent:
check_connection()is bottlenecked byimport litellm(~64s on WSL2 cold cache, network call to fetch model pricing/metadata).RAGThreatGenerator.__init__()is bottlenecked byimport langchain_chroma(~26s cold). By overlapping them, the 26s chroma import is hidden inside the 64s litellm import. Net saving: ~26s on every cold start where RAG is enabled.
Why run_in_executor and not asyncio.create_task: RAGThreatGenerator.__init__() is
synchronous and blocking (disk I/O, model loading). create_task would run in the same thread
and block the event loop. run_in_executor offloads to the default ThreadPoolExecutor,
keeping the event loop responsive.
Fallback: If the pre-warm raises (vector store missing, config error), init_ai() retries
synchronously once before setting rag_generator = None.
What was rejected: Starting RAG initialization from ThreatModelService.__init__() (before
init_ai() is even called) was considered but would require passing the AI config status back
and creates a race condition if the AI goes offline between warmup and use.
Choice: ReportGenerator._get_bom_loader(threat_model) resolves a BOMLoader from the
model's _model_file_path (auto-discovers {parent}/BOM/) or context_config['bom_directory'].
In _get_all_threats_with_mitre_info(), BOM known_cves are appended to the list returned by
cve_service.get_cves_for_equipment() before the CAPEC intersection check — for both pytm and
AI-element threat scoring loops.
Why it matters: CVEService.get_cves_for_equipment() only knows CVEs declared in
cve_definitions.yml (YAML definitions keyed by equipment name). BOM files carry the
authoritative list of CVEs for each deployed asset (from vulnerability scanners, patch managers,
or SBOM pipelines). Combining both sources ensures that CVE-based VOC scoring reflects reality
rather than only the manually maintained YAML.
What was rejected: Feeding BOM CVEs directly into CVEService as a secondary lookup would
require mutating a shared service object or adding BOM awareness to CVEService itself — coupling
it to the BOM infrastructure. Augmenting the call site list is purely additive and keeps
CVEService focused on its existing JSONL/YAML data.
Choice: server.py adds an initial_model_file_path: Optional[str] module-level global,
set in run_server() when a model file is successfully loaded. It is passed as model_file_path
to every export_*_logic() route call, which propagates it through ThreatModelService →
ExportService → create_threat_model() → ThreatModel._model_file_path. With _model_file_path
set, ExportService._resolve_gdaf_context() and _resolve_bom_directory() can auto-discover
the model's context/ and BOM/ sibling directories.
Why a global: The Flask routes are stateless functions with no access to the run_server()
scope. The alternative — injecting the path through ThreatModelService.__init__() — would
require a new constructor parameter and rebinding of the service on model switch, which is more
invasive. A module-level global is consistent with the existing initial_markdown_content and
initial_project_path pattern already used by the server.
Scope: When the server starts without a model file (empty editor mode), initial_model_file_path
stays None and exports proceed without BOM/context auto-discovery — matching the previous
behaviour exactly.