Production / ISO: Users access Bloud at http://bloud.local (port 80). Native NixOS Traefik service binds directly to port 80 via CAP_NET_BIND_SERVICE. No iptables redirect needed.
Local dev (NixOS dev-server): Access via http://localhost (port 80, Traefik). The dev-server NixOS config runs Traefik on port 80.
- Service worker is registered on the Traefik port
- Iframe content is served through Traefik
- Everything is same-origin
- NEVER access Vite directly on port 5173
The architecture is: Browser → port 80 → Traefik → Vite/Apps.
THIS IS NON-NEGOTIABLE. Do not skip these steps.
Before proposing any fix or making claims about root causes:
- Gather actual evidence by running commands, adding logs, and observing output
- Explain what evidence was gathered and what it shows
- Walk through the reasoning step by step
- Only then propose changes, with clear justification tied to the evidence
- Do not propose changes based on assumptions or theories
- Do not claim to know the cause without evidence
- If asked "why is this needed?", have concrete evidence ready
- A plausible-sounding theory is NOT evidence
NEVER do this:
"The issue is probably X because Y could happen" → proposes code change
ALWAYS do this:
"I suspect X. Let me add logging to verify" → gathers data → shows output →
confirms/refutes theory with evidence → THEN proposes fix
Real example of what NOT to do:
- User reports 404 errors on /api/v3/* requests
- BAD: "The issue is the SW update clears the clientAppMap" → proposes fix
- GOOD: "Let me add debug logging to see what clientId and clientApp values are" → observes: before SW update clientApp='radarr', after update clientApp=null → "Evidence confirms the SW update clears the map" → proposes fix
When debugging:
- State what you're checking and why
- Run the command or add the logging
- Explain what the output means
- Then decide next steps
nixos/bloud.nix- The primary module for local testing with rootless podman
Located in apps/<name>/ with each app having:
metadata.yaml- App catalog info (name, description, integrations, etc.)module.nix- NixOS module for the appconfigurator.go- Go configurator for runtime integrations
-
nixos/lib/podman-service.nix- Creates systemd user services for podman containers -
nixos/lib/authentik-blueprint.nix- Generates Authentik OAuth2 blueprints
- Services can be in
failedstate from previous runs inactive/deadmeans not started, not necessarily broken- Check
journalctl --user -u <service>for actual errors
- Check service status:
systemctl --user list-units 'podman-*.service' --all - Check logs:
journalctl --user -u podman-<name>.service - Check container state:
podman ps -a - Check from container's UID perspective:
podman unshare ls -la <path>
- Stale data with wrong permissions from previous runs
- Services staying in failed state after cleanup (need manual restart or rebuild)
- UID mapping: host user maps to root inside container with rootless podman
With rootless podman, UIDs are remapped:
- Host UID 1000 (daniel) → Container UID 0 (root)
- Container UID 1000 → Host UID 100999 (from subuid range)
Problem: Containers running as non-root users (e.g., Authentik runs as UID 1000) can't write to directories owned by the host user.
Solution: Use --userns=keep-id which maps Host UID 1000 → Container UID 1000 (preserves UID).
Files created by containers may be owned by mapped UIDs that the host user can't delete.
Solution: Use podman unshare rm -rf <path> to delete from the container's UID namespace.
after+wants= ordering only, doesn't wait for healthrequires= hard dependency, service fails if dependency fails- For oneshot services with
RemainAfterExit=true, dependent services wait for completion
The mkPodmanService helper supports:
waitFor- list of{container, command}to health check before startingextraAfter/extraRequires- additional systemd dependencies
Example:
mkPodmanService {
name = "my-app";
waitFor = [
{ container = "postgres"; command = "pg_isready -U user"; }
{ container = "redis"; command = "redis-cli ping"; }
];
extraAfter = [ "my-init.service" ];
extraRequires = [ "my-init.service" ];
}Design Principle: Each Bloud host runs a maximum of one instance of each core infrastructure service:
- 1 PostgreSQL instance per host - All apps requiring PostgreSQL share this single instance
- 1 Redis instance per host - All apps requiring Redis share this single instance (currently used by Authentik)
- 1 Restic instance per host - Single backup service for all app data (not yet implemented)
Benefits:
- Resource efficiency: Lower RAM and CPU usage vs. per-app instances
- Simplified operations: One service to monitor, backup, and maintain
- Better performance: Shared connection pooling and caching
- Data consistency: Single source of truth
Implementation:
- Apps connect via environment variables to shared services
- NixOS modules ensure only one instance is created per host
- Service dependencies ensure apps wait for shared infrastructure
CRITICAL CONSTRAINT: No app-specific routes at root level.
All embedded apps MUST be served under /embed/{appName}/ paths. URL rewriting via service worker handles apps that use absolute paths.
See docs/embedded-app-routing.md for full details.
Go and npm are built outside the Nix sandbox using their native toolchains. Nix only packages the pre-built artifacts into the ISO.
Why: Nix's sandbox blocks network access during builds. buildGoModule and buildNpmPackage work around this with fixed-output derivations that require pre-declared hashes (vendorHash, npmDepsHash). These hashes break on every dependency change. Since Go builds are already reproducible (pinned by go.sum + Go version) and npm builds by package-lock.json, building inside the sandbox adds no meaningful reproducibility — only fragility.
How it works:
- CI builds the Go binary and frontend with native toolchains (see
.github/workflows/build-iso.yml) - Artifacts are placed in
build/host-agentandbuild/frontend/ nixos/packages/host-agent.nixpackages them into a Nix derivation (just file copying, no compilation)- If artifacts don't exist, a stub derivation is used so
nix flake checkstill passes
Local ISO builds require building the artifacts first:
mkdir -p build
cd services/host-agent && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o ../../build/host-agent ./cmd/host-agent
cd ../.. && npm ci && npm run build --workspace=services/host-agent/web
cp -r services/host-agent/web/build build/frontend
git add -f build/ # Nix flakes only see git-tracked files
nix build .#packages.x86_64-linux.isoLocal dev is unaffected — ./bloud start uses go-watch/Vite directly, never touches Nix package builds.
Three issues arise when BLOUD_FLAKE_PATH points to a bundled store path (e.g. /nix/store/<hash>-bloud-host-agent-0.1.0/share/bloud):
1. Re-exec step (_NIXOS_REBUILD_REEXEC=1)
nixos-rebuild builds $flake#...nixos-rebuild and re-execs from it before switching. Setting _NIXOS_REBUILD_REEXEC=1 skips this optimization step (it's safe to skip). Must be passed inline via sudo env since sudo strips env vars.
2. Nix treats store paths as already-built (path: prefix)
nix build /nix/store/hash/subdir#attr treats the whole URI as a store path reference, returning the STORE ROOT (/nix/store/hash) directly without evaluating the flake at all. This causes switch-to-configuration to be looked up in the host-agent package (which doesn't have it).
Fix: Use path:/nix/store/.../share/bloud — the path: URI scheme forces Nix to evaluate the flake.nix properly. See flakeURI() in services/host-agent/internal/nixgen/rebuild.go.
3. App module.nix files not in store
bloud.nix imports ../apps/*/module.nix to load app NixOS modules. The original package only copied metadata.yaml and icon.png from each app. Without module.nix, bloud.apps.* options are undefined, causing eval failure.
Fix: nixos/packages/host-agent.nix now copies module.nix alongside metadata.
4. Host-agent package defaults to stub from store
When packages/host-agent.nix is evaluated from the store, ../../build doesn't exist, so hasPrebuilt=false and the stub is used. The stub fails to build, blocking nix build system.build.toplevel.
Fix: Detect when running from a deployed store path (binary exists 4 dirs up at ../../../../bin/host-agent) and use builtins.storePath to reference the already-deployed package without rebuilding.
systemd services run with a stripped PATH that excludes /run/wrappers/bin (sudo) and /run/current-system/sw/bin (nixos-rebuild, systemctl, etc.). Always use absolute paths in any code that runs inside a systemd service:
sudo→/run/wrappers/bin/sudonixos-rebuild→/run/current-system/sw/bin/nixos-rebuildsystemctl→/run/current-system/sw/bin/systemctl
The host-agent API (localhost:3000) uses session cookie auth. Requests from 127.0.0.1 (loopback) bypass auth automatically — shell access to the machine implies CLI trust. This is how ./bloud install works: it SSHes into the VM and curls localhost:3000 directly.
External requests (through Traefik or from the browser) still require a valid session cookie.
The ./bloud CLI has two modes, detected automatically:
- Native NixOS mode (default) — runs dev services directly on a NixOS machine with hot reload
- Proxmox mode (when
BLOUD_PVE_HOSTis set) — deploys the ISO to a Proxmox host for integration testing
Requires a NixOS machine (physical or VM) with the dev-server flake configuration applied.
npm run setup # Installs deps + builds ./bloud CLI
./bloud setup # Checks prerequisites and applies NixOS configurationNative NixOS mode (development):
./bloud start # Start dev environment
./bloud stop # Stop dev services
./bloud status # Show dev environment status
./bloud logs # Show logs from dev services
./bloud attach # Attach to tmux session (Ctrl-B D to detach)
./bloud shell [cmd] # Run a command (or open a shell)
./bloud rebuild # Rebuild NixOS configurationProxmox mode (ISO integration testing, requires BLOUD_PVE_HOST):
./bloud start [iso] # Deploy ISO → create VM → boot → check (VM stays running)
./bloud start --skip-deploy # Reuse existing VM, re-run checks
./bloud stop # Stop VM
./bloud destroy # Destroy VM
./bloud status # Show VM and service status
./bloud logs # Stream VM journalctl
./bloud shell [cmd] # SSH into VM
./bloud checks # Run health checks against running VM
./bloud install <app> # Install app via API
./bloud uninstall <app> # Uninstall app via APISet BLOUD_PVE_HOST in your .env file or environment, then:
./bloud start # Test latest GitHub release (VM stays running after checks)
./bloud start ./bloud.iso # Test a local ISO
./bloud shell # SSH into the running VM
./bloud logs # Stream journalctl output
./bloud checks # Re-run health checks against a running VM
./bloud install <app> # Install an app on the running VM
./bloud destroy # Tear down the VM when doneBLOUD_PVE_HOST can be set in a .env file at the project root — the CLI loads it automatically:
BLOUD_PVE_HOST=root@10.0.0.165
If you modify .nix files (like adding new apps):
./bloud rebuild # Apply NixOS changes