Skip to content

kovtcharov/beacon

Repository files navigation

🔥 Beacon

The voice-to-record first hop into CJADC2 — built for DDIL. Speak a field report. Get a doctrinally-correct, schema-validated structured record on every operator's screen — offline, in seconds, over a $30 mesh radio.

A proof-of-concept built for the 2026 NatSec Hackathon (Army xTech / Cerebral Valley · May 2026).


Beacon demo — one report becomes a structured record across the mesh


The problem

"DDIL — disconnected, degraded, intermittent, and low-bandwidth conditions — is no longer a contingency. It is the default environment for many operations."

Yet the Army's first hop, from a soldier's voice to a structured record, still works like this:

  1. A soldier in the dirt sees something time-critical — a casualty, a contact, a fuel state.
  2. They squeeze a doctrinally-shaped 9-line out of their head onto a radio in the middle of a fight.
  3. The receiver writes it down (and hopefully gets the urgency right).
  4. Someone re-keys it into GCSS-Army, the MEDEVAC desk, or the targeting tool (Maven Smart System, AIP, TITAN).

Each handoff loses fidelity. Each handoff costs minutes. And in DDIL conditions there often isn't a connection to do any of it. The GAO has flagged that even when CJADC2 connectivity exists, "overly restrictive data classification is a significant hindrance to sharing command and control data."

AI has compressed sensor-to-shooter from 20 minutes to 20 seconds. The kill chain is fast. The first hop — voice to typed record — is still glacial, and it's where everything currently breaks.

What Beacon does

Beacon owns that first hop. Not a new C2 system — the missing edge in front of the ones the DoD has already paid for.

  • Speak normally. A free-form report from a phone, headset, or laptop. No 9-line template, no menus.
  • Get a doctrinally-correct structured record back, on-device. A small local LLM extracts a MEDEVAC 9-line (FM 4-02.2), LOGSTAT (FM 4-0), contact event (ADP 6-0), call-for-fire (FM 3-09), or convoy reroute (FM 3-90) — typed and Pydantic-validated, doctrinally pin-cited, never free-text.
  • Sync over a $30 radio. Records propagate across a Meshtastic LoRa mesh — no wifi, no cell, no SATCOM, no tower. Every peer Beacon node sees the same structured picture in seconds. (Same radio class that already integrates with ATAK — the "tactical operating system" with 500K+ users across the force.)
  • Push upstream when uplink returns. Confirmed records flow into Palantir AIP / Maven Smart System, GCSS-Army, and any CJADC2 surface through their existing REST endpoints. Pending proposals never leak.
  • Human-in-the-loop on life-or-death. MEDEVAC, fires, and reroutes are always proposed by the AI and confirmed by an operator — aligned with the DoD Responsible AI Strategy ("humans retain responsibility for final engagement decisions").

Why it's different

Voice radio Cloud-dependent C2 Maven / AIP / GCSS-Army Beacon
Works in DDIL (default conditions) partial
Works without infrastructure
Outputs typed, doctrinally-correct records
Time from speech → record minutes (re-keying) minutes (re-keying) n/a (consumes records) seconds
Operator workload high medium low low
Aligned with DoD RAI human-in-the-loop policy n/a varies varies (enforced)

Beacon doesn't replace C2 — it feeds C2. It doesn't replace human judgement — it proposes, and a human disposes.

Where it fits

Connection topology: field nodes → mesh → central dashboard with the logistics operator's local AI assistant

Existing program / surface Beacon's relationship
MEDEVAC desk (FM 4-02.2 9-line) Beacon emits the 9-line as a typed record; desk gets a clean, traceable request
GCSS-Army (LOGSTAT, sustainment ERP) Beacon ingests Class III/V/etc. consumption from voice → SupplyStatus record
Maven Smart System / Palantir AIP Beacon pushes confirmed records through REST; AIP/Maven owns the kill chain — Beacon owns the first hop
ATAK / TAK ecosystem (500K+ users) Roadmap target: native ATAK plugin surface (today: web dashboard)
Mission Command (ADP 6-0) Beacon contributes to the COP — peer nodes share situational state immediately
DoD RAI Strategy + NIST AI RMF Confirm-required gate on life-or-death tools enforced at the trigger lane
CJADC2 Beacon is the data-fabric on-ramp from the dirt — local-first, classification-friendly (records can be reviewed pre-emission)

Status — read before you ship anything against this

This is a hackathon proof-of-concept, not a fielded system. Treat it as a working reference architecture you can run on a laptop today, not as a deployable product.

What works (verified by the test suite — 397 tests collect, 4 e2e gated by BEACON_E2E=1)

  • Agent pipeline: voice/text → tool call → schema-validated record → SQLite. 7 tools (3 confirm-required: request_medevac, recommend_reroute, request_fires); 9 RAG tools from GAIA's RAGToolsMixin.
  • Three LLM backends: Lemonade (default), OpenAI (BEACON_USE_OPENAI=1), Claude (BEACON_USE_CLAUDE=1). Default model Qwen3-4B-Instruct-2507-GGUF.
  • Voice transcription on the server: POST /voice/transcribe proxies to Lemonade's OpenAI-compatible Whisper endpoint. 502 on backend down, no in-process fallback.
  • Mesh transport: in-process UDP adapter; envelope JSON with mesh-id namespacing, no-loopback, ACK tracking. /mesh/{status,enable,disable,test/*} endpoints.
  • AIP push: confirmed-only trigger lane subscriber. {connected:false} when AIP_BASE_URL unset — real not-configured state, not a mock.
  • InsightsEngine: agent-only (no heuristics), 1 s trailing debounce, JSON-array contract.
  • Reasoning trace: 500-step in-memory ring buffer streamed at /agent/reasoning/stream. OBSERVE, TOOL, PROPOSE, SURFACE, DECIDE, QUERY step types; bookends every query lifecycle.
  • Web dashboard (Vite + React) with EventFeed, Map, Insights panel, Burndown panel, Mesh panel, AIP panel, Reasoning panel, Doctrine corpus panel.
  • Eval framework: beacon-eval CLI; Claude judge via either subscription auth or ANTHROPIC_API_KEY; 24 scenarios across 8 categories.

What does not exist yet

  • No ATAK plugin and no mobile companion app. Today's only operator surface is the web dashboard. ATAK plugin is the obvious next surface given the 500K-user ecosystem.
  • No push-to-talk button in the dashboard UI. The browser voice helper and the /voice/transcribe endpoint exist; the on-screen button was removed. Headset clients can still POST WAV bytes.
  • No real LoRa radio validation. The UDP mesh transport has been exercised locally and against the standalone MeshService server. It has not been wired through actual Meshtastic hardware in this branch.
  • Hardware probes from Phase 0 were not run. Development is on a Mac M4 Pro (Apple Silicon). Spec-target latencies (5 s p50, 8 s p95) come from a Toughbook-class assumption; we measured ~5 s p50 on M4 Pro for the canonical 6-scenario corpus and have not confirmed Ryzen AI numbers.
  • Eval corpus is hackathon-scale. 24 scenarios across iran-showcase, mixed, rag, medevac, logstat, doctrinal, edge, showcase. The agent-spec called for ~40 across more axes; treat the numbers as a sanity gate, not a release gate.
  • AIP integration has never been pointed at a real Palantir tenant. Tests use FakeHTTP. The mapping (medevac → MedevacRequest, etc.) is correct; the live wiring is unproven.
  • All data is synthetic. No real OPORDs, casualty data, unit positions, MGRS coordinates, or supply rates ship with this repo, and none should be added.

If your downstream decision depends on Beacon doing something not in the What works list, it doesn't do it yet.


Quickstart

git clone https://github.com/kovtcharov/beacon && cd beacon
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Smoke check (397 tests collect; 4 e2e are skipped without BEACON_E2E=1)
pytest tests/

# Seed synthetic demo state (3 companies × 5 supply items × 6h history + a convoy event)
beacon seed

Run

You need three processes for the full loop. Start them in three terminals.

# 1) Local LLM backend. Required unless you flip BEACON_USE_OPENAI=1 or BEACON_USE_CLAUDE=1.
lemonade-server serve

# 2) Beacon API. Defaults to :8888 (Lemonade owns :8001/:8002 when warm).
beacon serve

# 3) Dashboard. :5173 collides with Claudia in some dev environments — use --port 5174 if needed.
cd dashboard && npm install && npm run dev -- --port 5174

Open http://localhost:5174, type a field report into the dashboard textbox, hit Submit, and watch records propagate through the EventFeed, Insights panel, and Reasoning panel. Confirm any pending life-or-death proposals.

CLI alternative (no dashboard)

beacon ask --auto-confirm "Bravo Co at 38SMB12345678, two wounded, gunshot, urgent surgical"
beacon chat              # interactive REPL with confirm prompts
beacon state             # dump current SQLite state as JSON

--auto-confirm is for scripted runs only — it bypasses the operator-confirm gate (confirm_mode="never"). Don't use it in demos.

LLM backends & env flags

Env Behavior
(none) Lemonade Server, default model Qwen3-4B-Instruct-2507-GGUF
BEACON_USE_OPENAI=1 OPENAI_API_KEY=… OpenAI fallback
BEACON_USE_CLAUDE=1 ANTHROPIC_API_KEY=… Claude fallback
BEACON_SKIP_LEMONADE=1 No LLM probing at startup. /health and /state work; /agent/run returns 502
BEACON_MODEL=Gemma-4-E4B-it-GGUF Override the default Lemonade model
BEACON_AGENT_TIMEOUT_S=90 Per-query wall-clock cap. Default 90 s. Set 0 to disable
BEACON_MESH=1 / 0 Auto-start the UDP mesh transport at lifespan startup / hard env lock
AIP_BASE_URL=… AIP_TOKEN=… Real AIP push. Otherwise /aip/entities returns {connected:false}

Confirm flow

request_medevac, recommend_reroute, and request_fires write to pending_actions and return status="pending_confirm". The trigger lane stays silent until an operator promotes the row:

curl -X POST http://localhost:8888/agent/confirm/<event_uuid>     # promote → fire trigger lane
curl -X POST http://localhost:8888/agent/reject/<event_uuid>      # drop, no emission

This is AGENTS.md § 2.2 — non-negotiable, and the seam where DoD RAI human-in-the-loop policy lives in code. Tests and demos must use the dashboard or the explicit --auto-confirm opt-in; nothing else bypasses it.


Eval

# Full corpus, Claude Code subscription auth (no ANTHROPIC_API_KEY needed)
beacon-eval --output eval/runs/baseline-$(date +%s)/ --judge-backend claude-code

# Single category
beacon-eval --category mixed --output eval/runs/mixed-only/ --judge-backend claude-code

# Anthropic SDK (default; needs ANTHROPIC_API_KEY)
beacon-eval --output eval/runs/sdk/

The judge is always Claude regardless of which backend the agent under test uses.

Repo layout

src/beacon/
  agent.py            BeaconAgent — GAIA subclass + tool registration + Qwen-specific overrides
  agent_runner.py     Async queue, /agent/{enqueue,status,start,pause,resume,stop}, 90 s timeout
  state.py            SQLite + two-lane EventBus + reap_orphaned_processing
  schemas.py          Every Pydantic model (records, payloads, mesh envelopes, reasoning steps)
  tools/              7 factory-pattern tools — make_<name>(state)
  mesh.py             UDPMeshTransport (alias BeaconMeshAdapter); subscribes to sync lane
  aip.py              Palantir REST adapter; subscribes to trigger lane only
  voice.py            Lemonade Whisper wrapper; raises BeaconVoiceError on any failure
  insights.py         Agent-only insights, 1 s debounce
  reasoning.py        500-step ring buffer for /agent/reasoning/stream
  server.py           FastAPI app + lifespan + all HTTP routes
  cli.py              `beacon serve · seed · ask · chat · state` (typer)
  prompts/            v1..v8.py — full file per version, diff-friendly. CURRENT = v8
  eval/               beacon-eval entry point (Claude judge)

dashboard/src/        Vite + React. App.tsx, EventFeed.tsx, Map.tsx, *Panel.tsx
docs/plans/           overview.md, agent-spec.md, implementation.md, glossary.md
docs/assets/          Diagrams + screenshots referenced from this README
eval/scenarios/       24 YAML scenarios across 8 categories
tests/                One file per source module. 397 collect; 4 e2e gated by BEACON_E2E=1

Disclaimers

All data is synthetic. No real OPORDs, casualty data, unit positions, MGRS coordinates, supply consumption rates, or military information of any kind. Operational details (callsigns, grids, OPFOR descriptions) are fictional and bear no resemblance to actual operations.

No AI-attribution. Per AGENTS.md § 3.5, this repo's commits, PRs, and code carry no Co-Authored-By: Claude trailers or AI-attribution footers. Humans are authors of record.

License

MIT — see LICENSE. Built on top of GAIA (Apache-2.0) and Meshtastic.

Why "Beacon"?

Named for the Beacons of Gondor — a chain of independent fire-watches that propagated an urgent alert from Minas Tirith to Edoras with no central infrastructure. Mesh radios instead of bonfires; MEDEVAC 9-lines instead of "Gondor calls for aid." And in case the resonance isn't obvious — we feed the palantíri the leaders look into.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors