Skip to content

mattmezza/humux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

708 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

humux
Human Multiplexer β€” your self-hosted personal AI agent.
humux.dev Β· Features Β· Quick Start Β· Docs
CI Python 3.14+ License


humux is a self-hosted personal AI agent that runs in a single Docker container. It multiplexes across all the channels of your digital life β€” Telegram, email, calendar, contacts, WhatsApp β€” into one unified, autonomous intelligence. It remembers, plans, acts, and speaks.

No cloud dependency. No data leaving your server. One docker compose up and you have your own AI.


✨ Features

Monorepo structure

Directory Contents
humux/ The agent application (Python, FastAPI, Docker)
docs/ Documentation site (Next.js, Fumadocs)
www/ Marketing website (HTML, Tailwind CSS v4)
Messaging β€” Talk to your agent wherever you are
  • Telegram β€” full bot with text, voice messages, reactions, inline approvals
  • WhatsApp β€” read and send via wacli CLI, link once and it stays authenticated
  • Multi-agent groups β€” several agent-bots share one Telegram group, each replies only when addressed, never loops with other bots
  • Reply decision β€” in group chats the agent decides per message whether to reply, with a hard rate cap that guarantees runaway loops end
  • Steerable mid-turn β€” redirect a long-running turn by just sending a follow-up; it's folded in before the agent's next step (reacts πŸ‘€ to confirm) instead of making you wait for it to finish on the wrong track
  • Per-chat settings β€” gate per Telegram chat who can trigger an agent and who may DM it
Agents β€” Swappable identities, each with its own bot
  • Each agent has its own character, skill/tool scope, voice, and email/calendar/contacts accounts
  • Each agent runs its own Telegram bot β€” several run concurrently as separate contacts
  • Agents are created and configured through the admin UI, no code needed
  • Per-agent tool identities β€” own gh token, own browser profile (#93)
Email β€” Read, compose, and route
  • Powered by Himalaya CLI (Rust β€” fast, stateless, JSON output)
  • Multi-account: Gmail, Fastmail, iCloud, or any IMAP/SMTP provider
  • Each agent can own a dedicated mailbox or be granted read/read-write access
  • Credentials resolve from the encrypted vault β€” never reach the model's context
Calendar & Contacts β€” Your schedule, your address book
  • CalDAV β€” Google Calendar, iCloud, any CalDAV server
  • Contacts β€” CardDAV (Purelymail, iCloud, Fastmail) and Google Contacts
  • Both bindable per-agent with read / read-write access levels
Memory β€” Four-tier persistent memory that learns and forgets
Tier What How
T1 Lexical Word-overlap retrieval Always-on, zero deps
T2 Semantic Embedding vectors (fastembed, on-device) Relevance-ranked injection
T3 Forgetting Importance score + access recency Cold memories archive automatically
T4 Hygiene Cluster + merge near-duplicates Self-healing compaction

Memories are extracted automatically from conversations. The agent reads AND writes them via sqlite3 CLI through the same skill system.

Scheduled tasks β€” Proactive, not just reactive
  • Cron-based jobs for morning briefings, email checks, memory consolidation
  • Subagent jobs: delegate recurring work to a named agent
  • One-shot tasks via Telegram (/jobs) or the admin UI
Subagents β€” Delegate subtasks to scoped sub-loops
  • Spawn a sub-loop under any agent, on demand or scheduled
  • Scope is a subset of the caller's β€” inherit-never-widen for tools, skills, secrets, and GitHub repo access
  • Runs sync (result returned in-turn) or background (distilled summary)
  • Monitor and cancel from Telegram or admin UI
Voice β€” Speak to your agent, hear it reply
  • STT: faster-whisper β€” local, offline, multi-language
  • TTS: edge-tts (cloud) or Kokoro 82M (fully offline, multilingual)
  • Voice marker syntax lets the model request a spoken reply per-turn
  • Per-agent voice selection
Secrets vault β€” Encrypted, two-tier, never in context
Vault Key Unseals
Infra vault Machine key (HUMUX_MASTER_KEY / data/master.key) At boot, headless β€” for provider keys, bot tokens
Agent vault Admin password (envelope encryption) On login β€” for website logins, payment keys

Secrets are referenced as ${vault:NAME} in config and {{secret:NAME}} in commands β€” the model never sees the value. Bitwarden import + secure-link credential requests included.

Permissions β€” You're always in control
  • Glob-pattern rules: ALWAYS / ASK / NEVER
  • Write actions ask for Telegram approval with context preview
  • Per-agent tool scoping β€” an agent can only use what you give it
  • Per-agent GitHub repo allowlist β€” an agent can only touch repos you authorize
Browser automation β€” The agent can browse the web
  • Optional headless Chromium (Playwright) for JS-heavy pages
  • Self-driving explore mode β€” an inner LLM loop navigates sites, fills forms, clicks buttons until done
  • Persistent logged-in profiles (cookies survive between calls)
  • Per-domain action rules (Allow / Ask / Block)
Web artifacts β€” Publish pages, dashboards, documents
  • The agent writes files under {workspace}/artifacts/<slug>/ with the coding harness
  • Served as shareable links at /artifacts/<slug>/ behind a sandbox CSP
  • Multi-file sites, PDFs, images β€” anything you can write to disk
Image generation β€” Visual answers
  • Optional generate_image tool (OpenRouter, fal.ai, or OpenAI)
  • Reuses your existing LLM API key for OpenRouter/OpenAI
  • Daily/monthly budget caps
Coding harness β€” The agent works on real code
  • read_file, write_file, edit_file, list_dir, grep, run_command_in_dir
  • Confined to one configurable workspace directory β€” path traversal blocked
  • run_command shares that root, so a cloned repo is readable, editable and committable in place
  • Each agent namespaces its files under a <slug>/ subdirectory
  • Reads pre-approved, writes ask permission
Admin UI β€” Full web dashboard
  • Configuration, agents, skills editor, memory inspection, job management
  • Per-agent log streams, filterable by stream / level / time / text
  • Agent lifecycle control (start/stop/restart)
  • Setup wizard for first boot
  • Built with FastAPI + HTMX + Alpine.js + Tailwind CSS v4

πŸš€ Quick Start

Prerequisites

1. Clone and configure

git clone https://github.com/mattmezza/mpa.git
cd mpa/humux
cp .env.example .env
cp config.yml.example config.yml
cp character.md.example character.md

Edit .env with your API keys. Edit config.yml to customize the agent name, owner, channels, and scheduled jobs.

2. Run with Docker Compose

cd mpa/humux
docker compose up -d

The admin UI is at http://localhost:8000. On first boot, humux starts in setup mode β€” a wizard walks you through the initial configuration.

3. Run without Docker (development)

Requires Python 3.14+ and uv.

cd mpa/humux
make setup       # creates venv, installs deps, copies example configs
make run         # starts the agent

4. Chat from the terminal (the CLI channel)

cd mpa/humux
make cli        # interactive CLI β€” type your messages, see the agent think
make cli AGENT=my-agent  # chat as a specific agent
make cli YOLO=1          # auto-approve all permissions (local testing)

The CLI is a first-class channel β€” the same agent, same data, same tools as Telegram, just at your terminal. It's also reachable remotely over the deploy host's SSH (ssh -t user@host docker exec -it humux uv run python -m core.cli), supports resumable sessions (--session / --sessions / --rm-session), and runs concurrently with the live server (all databases run in WAL mode). See the Channels β†’ CLI docs for details.


πŸ—οΈ Architecture

humux follows a Python orchestrator + CLI tools design. Python handles the async LLM loop, the admin web UI, and orchestration. Battle-tested CLI tools handle protocol complexity:

Concern Tool
LLM Anthropic Claude, OpenAI, Grok (xAI), Google, DeepSeek, OpenRouter
Email Himalaya CLI (Rust)
Contacts Built-in contacts CLI (CardDAV + Google People)
Calendar python-caldav
WhatsApp wacli (Go)
Browser Playwright (Chromium)
Voice STT faster-whisper (CTranslate2)
Voice TTS edge-tts / Kokoro 82M
Web search Tavily
Scheduler APScheduler
Storage SQLite (8 databases)
Admin UI FastAPI + Jinja2 + HTMX + Alpine.js + Tailwind CSS v4

The skill system

Instead of hardcoded integrations, the agent learns to use CLI tools via markdown skill files stored in SQLite. Skills are injected into the LLM's context on-demand during conversations. Adding a new capability means:

  1. Install the CLI tool
  2. Write a markdown file teaching the agent how to use it
  3. Add the command prefix to the executor whitelist

No Python code. No redeploy. The agent picks it up on the next turn.

Project structure

humux/
β”œβ”€β”€ core/             Core agent modules
β”‚   β”œβ”€β”€ agent.py          LLM tool-use loop with agentic reasoning
β”‚   β”œβ”€β”€ llm.py            Multi-provider LLM client abstraction
β”‚   β”œβ”€β”€ memory.py         Four-tier memory extraction + consolidation
β”‚   β”œβ”€β”€ config.py         Pydantic config models, YAML/env loader
β”‚   β”œβ”€β”€ config_store.py   SQLite-backed config store + setup wizard
β”‚   β”œβ”€β”€ executor.py       CLI command executor with prefix whitelist
β”‚   β”œβ”€β”€ permissions.py    Glob-pattern permission engine
β”‚   β”œβ”€β”€ skills.py         SQLite-backed skills store + lazy loading
β”‚   β”œβ”€β”€ scheduler.py      APScheduler wrapper for cron/one-shot jobs
β”‚   β”œβ”€β”€ subagents.py      Scoped sub-loop delegation
β”‚   β”œβ”€β”€ vault.py          Encryption primitives + key management
β”‚   β”œβ”€β”€ secret_store.py   SQLite-backed secrets vault storage + ACL
β”‚   β”œβ”€β”€ artifacts.py      Web artifact serving (sandboxed)
β”‚   β”œβ”€β”€ coding.py         Confined workspace file tools
β”‚   β”œβ”€β”€ compaction.py     Conversation compaction for session history
β”‚   β”œβ”€β”€ github_app.py     GitHub App JWT minting + installation tokens
β”‚   β”œβ”€β”€ history.py        Conversation history persistence
β”‚   β”œβ”€β”€ imagegen.py       Image generation with budget caps
β”‚   β”œβ”€β”€ job_store.py      Scheduled job persistence
β”‚   β”œβ”€β”€ log_streams.py    Per-agent structured log streaming
β”‚   β”œβ”€β”€ agents.py         Agent definitions, CRUD, markdown parsing
β”‚   β”œβ”€β”€ embeddings.py     Local (fastembed) + remote embeddings
β”‚   β”œβ”€β”€ goal_decomposition.py  Task breakdown for complex requests
β”‚   β”œβ”€β”€ task_reflection.py     Post-task reflection store
β”‚   └── reply_decision.py      Group-chat reply gate
β”œβ”€β”€ channels/         Communication channels
β”‚   └── telegram.py       Telegram bot (text, voice, approvals)
β”œβ”€β”€ api/              Admin web interface
β”‚   β”œβ”€β”€ admin.py          FastAPI routes + HTMX partials
β”‚   β”œβ”€β”€ templates/        Jinja2 templates
β”‚   └── static/           Tailwind CSS
β”œβ”€β”€ voice/            Voice pipeline
β”‚   └── pipeline.py       Whisper STT + edge-tts/Kokoro TTS
β”œβ”€β”€ tools/            CLI helper scripts
β”‚   β”œβ”€β”€ calendar_read.py  CalDAV event reader
β”‚   β”œβ”€β”€ calendar_write.py CalDAV event creator
β”‚   β”œβ”€β”€ contacts.py       CardDAV/Google Contacts client
β”‚   β”œβ”€β”€ browser.py        Headless browser automation (Playwright)
β”‚   └── skills.py         Skills management CLI
β”œβ”€β”€ skills/           Markdown skill files (seed β†’ SQLite)
β”œβ”€β”€ schema/           SQL schema files
└── tests/            Test suite (pytest + asyncio + xdist)
docs/             Documentation site (Next.js + Fumadocs)
www/              Marketing site (humux.dev)

βš™οΈ Configuration

humux uses a dual-layer config system:

  1. config.yml + .env β€” File-based seed config loaded on first boot. Supports ${ENV_VAR} interpolation and ${vault:NAME} for secrets.
  2. SQLite config store (data/config.db) β€” Becomes the source of truth after setup. Managed through the admin UI.

Key files

File Purpose
.env API keys and secrets
config.yml Agent settings, channels, calendar, scheduler jobs
character.md Agent identity, personality, and communication style
skills/*.md Skill documents that teach the agent how to use tools
agents/*.md Optional agent-definition seed files

πŸ“¦ Tech Stack

Category Technology
Language Python 3.14+ with uv
LLM providers Anthropic Claude, OpenAI, Google Gemini, Grok (xAI), DeepSeek, OpenRouter (any OpenAI-compatible)
Messaging python-telegram-bot, wacli (WhatsApp)
Persistence SQLite (8 databases: config, skills, agents, history, memory, reflections, jobs, imagegen)
Admin UI FastAPI + Jinja2 + HTMX + Alpine.js + Tailwind CSS v4
Voice faster-whisper (STT), edge-tts / Kokoro 82M (TTS)
Browser Playwright (Chromium), headless or CDP sidecar
Scheduler APScheduler
Search Tavily
Container Docker (single image, multi-stage)
CI/CD GitHub Actions (lint, test, build, publish to ghcr.io)

πŸ’¬ Community & Support


🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run make lint and make test
  5. Open a pull request

See Development docs for setup instructions.


humux β€” Human Multiplexer

Built with ❀️ by Matteo Merola