This guide covers day-to-day operational concerns for running identree in production: reverse proxy configuration, backup procedures, monitoring, audit durability, scaling, and security hardening.
identree listens on plain HTTP internally (default :8090). TLS must always be terminated at a reverse proxy in front of it. IDENTREE_EXTERNAL_URL must match the public HTTPS URL exactly (e.g. https://identree.example.com).
The /api/events endpoint uses Server-Sent Events (SSE) for real-time dashboard updates. Your proxy must not buffer this path or it will break live challenge notifications.
Your proxy should set (or pass through) these headers on every request:
| Header | Purpose |
|---|---|
X-Forwarded-For |
Client IP for audit logs and rate limiting |
X-Forwarded-Proto |
Lets identree know the original scheme was HTTPS |
Host |
Must match the hostname in IDENTREE_EXTERNAL_URL |
Strip X-Forwarded-* headers from untrusted clients at the edge to prevent IP spoofing.
upstream identree {
server 127.0.0.1:8090;
}
server {
listen 443 ssl http2;
server_name identree.example.com;
ssl_certificate /etc/ssl/certs/identree.pem;
ssl_certificate_key /etc/ssl/private/identree.key;
# Strip X-Forwarded-* from untrusted clients
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $host;
location / {
proxy_pass http://identree;
# WebSocket / SSE support (required for /api/events)
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 3600s;
}
}The critical lines for SSE are proxy_buffering off and the long proxy_read_timeout. Without these, the dashboard will not receive live challenge updates.
Caddy handles TLS automatically via Let's Encrypt:
identree.example.com {
reverse_proxy 127.0.0.1:8090 {
# Disable response buffering for SSE
flush_interval -1
}
}
Caddy sets X-Forwarded-For and X-Forwarded-Proto by default.
services:
identree:
image: identree:latest
labels:
- "traefik.enable=true"
- "traefik.http.routers.identree.rule=Host(`identree.example.com`)"
- "traefik.http.routers.identree.tls=true"
- "traefik.http.routers.identree.tls.certresolver=letsencrypt"
- "traefik.http.services.identree.loadbalancer.server.port=8090"
# Disable response buffering for SSE
- "traefik.http.middlewares.identree-nobuffer.buffering.maxResponseBodyBytes=0"
- "traefik.http.routers.identree.middlewares=identree-nobuffer"Traefik forwards X-Forwarded-For and X-Forwarded-Proto by default.
identree stores all persistent state as JSON files in /config/ (or wherever the corresponding IDENTREE_*_FILE variables point). Back up these files regularly:
| File | Contents | Impact if lost |
|---|---|---|
/config/sessions.json |
Active approved sudo sessions | Users must re-approve; no data loss |
/config/uidmap.json |
UID/GID assignments for LDAP users | UID reassignment breaks file ownership on hosts |
/config/hosts.json |
Registered host registry | Hosts must re-register |
/config/sudorules.json |
Sudo rules (bridge mode) | Sudo policies must be recreated |
uidmap.json is the most critical file. If lost, identree assigns new UIDs to existing users, which breaks file ownership on every managed host. Back this up regularly.
- Break-glass hash files (
/etc/identree-breakglasson each managed host) -- the bcrypt hash is the only local authentication fallback if the server is unreachable. - TOML config (
/etc/identree/identree.toml) -- if you use the live configuration editor in the admin UI, changes are written to this file. - Escrow data -- if using the
localescrow backend, the encrypted break-glass passwords are stored inside identree's state. Ensure your backup captures the full/config/directory.
- Stop identree.
- Restore
/config/sessions.json,/config/uidmap.json,/config/hosts.json, and/config/sudorules.jsonfrom backup. - Restore
/etc/identree/identree.tomlif applicable. - Verify environment variables or config file contain all required secrets (
IDENTREE_SHARED_SECRET,IDENTREE_OIDC_CLIENT_SECRET, etc.). - Start identree.
- Open the admin UI and verify hosts appear and LDAP sync completes.
GET /healthz
Returns JSON with per-component status:
{
"status": "ok",
"checks": {
"disk": "ok",
"ldap_sync": "ok",
"ldap_server": "ok",
"pocketid": "ok",
"oidc": "ok"
}
}| HTTP status | status field |
Meaning |
|---|---|---|
| 200 | "ok" |
All components healthy |
| 200 | "degraded" |
PocketID or OIDC issuer unreachable (LDAP continues from cache) |
| 503 | "unhealthy" |
Critical failure: disk not writable, LDAP sync stale, or LDAP server not started |
The Docker image includes a built-in HEALTHCHECK that polls /healthz every 30 seconds.
Component statuses:
| Check | "ok" |
Failure value | Severity |
|---|---|---|---|
disk |
Config directory is writable | "not_writable" |
Critical (503) |
ldap_sync |
Last sync within 1.5x refresh interval | "stale" |
Critical (503) |
ldap_server |
LDAP listener is bound | "not_started" |
Critical (503) |
pocketid |
PocketID API responds (full mode only) | "unreachable" |
Degraded (200) |
oidc |
OIDC discovery endpoint responds | "unreachable" |
Degraded (200) |
Metrics are served at GET /metrics in Prometheus exposition format.
Authentication: When IDENTREE_METRICS_TOKEN is set, requests must include Authorization: Bearer <token>. When unset, metrics are served without authentication.
Prometheus scrape config:
scrape_configs:
- job_name: identree
scheme: https
static_configs:
- targets: ["identree.example.com"]
authorization:
credentials: "your-metrics-token"| Metric | Type | Alert condition | Description |
|---|---|---|---|
identree_challenges_created_total |
Counter | Unexpected drop to zero | No challenges being created may indicate PAM misconfiguration |
identree_challenges_approved_total |
Counter | — | Track approval rate |
identree_challenges_auto_approved_total |
Counter | — | Grace period / one-tap approvals |
identree_challenges_denied_total |
Counter (by reason) |
Spike in denials | Possible brute-force or misconfiguration |
identree_challenge_duration_seconds |
Histogram | p95 > 60s | Users waiting too long for approval |
identree_audit_events_total{status="dropped"} |
Counter | Any increase | Audit events are being lost due to buffer overflow |
identree_audit_events_total{status="failed"} |
Counter | Any increase | A sink is failing to deliver events |
identree_breakglass_escrow_total{status="failure"} |
Counter | Any increase | Break-glass password escrow is failing |
identree_auth_failures_total |
Counter | Spike | Invalid shared secrets -- possible misconfigured or rogue host |
identree_rate_limit_rejections_total |
Counter | Sustained increase | Legitimate users may be rate-limited |
identree_ldap_sync_failures_total |
Counter | Any increase | PocketID API unreachable or returning errors |
identree_registered_hosts |
Gauge | Unexpected decrease | Hosts may have been removed |
identree_oidc_exchange_duration_seconds |
Histogram | p95 > 5s | OIDC provider is slow |
identree_notifications_total{status,channel} |
Counter | Spike in status="failed" |
Notification delivery failures by channel |
identree_notification_delivery_duration_seconds{channel} |
Histogram | p95 > 10s | Notification delivery latency by channel |
Ready-to-import Grafana dashboard JSON files are provided in deploy/grafana/:
| File | Contents |
|---|---|
identree-overview.json |
Challenge flow (rates, durations, active gauges), notifications (by channel/status, delivery latency), authentication (auth failures, rate limiting, OIDC latency), LDAP (sync failures, query rates, bind failures, host count), and break-glass escrow operations |
identree-audit-health.json |
Audit pipeline health (emitted/dropped/failed events by sink) |
Importing into Grafana:
- Open Grafana and navigate to Dashboards > Import (or the
+menu > Import dashboard). - Click Upload JSON file and select one of the files from
deploy/grafana/. - On the import screen, select your Prometheus datasource from the Prometheus dropdown (the dashboards use a
${DS_PROMETHEUS}variable). - Click Import.
Both dashboards default to a 6-hour time range with 30-second refresh. The overview dashboard uses collapsible rows to organize panels by subsystem.
Recommended alerts (configure in Grafana or Alertmanager):
identree_audit_events_total{status="dropped"}-- any increase means the audit buffer is fullidentree_audit_events_total{status="failed"}-- any increase means a sink is failingidentree_auth_failures_total-- spike indicates invalid shared secrets (misconfigured or rogue host)identree_breakglass_escrow_total{status="failure"}-- any increase means escrow is brokenidentree_challenge_duration_secondsp95 > 60s -- users waiting too long for approvalidentree_oidc_exchange_duration_secondsp95 > 5s -- OIDC provider is slow
identree supports multiple audit sinks running simultaneously. Each has different durability characteristics. Understanding these tradeoffs is critical for compliance.
- Durability: Synchronous write. Events are written to the output stream before the function returns.
- Loss scenario: A process crash (SIGKILL, OOM) may lose the event currently being written. In practice, this is effectively zero-loss for normal operations.
- Recommendation: Use as your primary sink. Container runtimes (Docker, Kubernetes) capture stdout automatically, making it the simplest and most reliable option.
IDENTREE_AUDIT_LOG=stdout
# or
IDENTREE_AUDIT_LOG=file:/var/log/identree/audit.jsonl- UDP: Fire-and-forget. Events may be lost on network congestion, packet drops, or if the syslog receiver is down. No delivery confirmation.
- TCP: Reliable delivery with automatic reconnection on connection failure. Events buffer in-process during reconnection.
- Recommendation: Use TCP (
tcp://host:601) if syslog is your compliance sink. UDP is acceptable only as a secondary/convenience sink.
IDENTREE_AUDIT_SYSLOG_URL=tcp://syslog.local:601 # reliable
IDENTREE_AUDIT_SYSLOG_URL=udp://syslog.local:514 # fire-and-forget- Durability: Events are batched (up to 100 events or 5 seconds) before being pushed over HTTP.
- Loss scenario: Up to 5 seconds of events (one batch window) can be lost on a hard crash. Events in the current batch that have not yet been flushed are gone.
- Recommendation: Use as a secondary sink alongside the JSON log sink. The JSON log captures everything synchronously; Splunk/Loki provides searchability and dashboards.
All sinks receive events through a buffered channel (default size: 4096). If all sinks are slow or blocked, new events are dropped rather than blocking the server. Dropped events are counted:
identree_audit_events_total{sink="_channel",status="dropped"}
Alert on any increase in this counter. If you see drops, either increase the buffer size or investigate why sinks are slow:
IDENTREE_AUDIT_BUFFER_SIZE=8192 # increase from default 4096Use the JSON log sink as your primary (captured by the container runtime with no configuration), and add Splunk/Loki/syslog as secondary sinks for search and alerting:
IDENTREE_AUDIT_LOG=stdout # primary: zero-loss
IDENTREE_AUDIT_SPLUNK_HEC_URL=https://splunk.example.com:8088/... # secondary: searchable
IDENTREE_AUDIT_SYSLOG_URL=tcp://syslog.local:601 # secondary: complianceBy default, identree runs as a single-node service backed by SQLite at /config/identree.db. All state — challenges, grace sessions, the action log, escrow records, agent heartbeats — lives in that one file. The SQLite database uses WAL mode and serialises writes through a single connection, so the operational footprint is minimal: one volume, one process, no clustering.
Do not run multiple identree replicas against the same SQLite file. SQLite is single-writer; a second instance trying to write to the same .db would corrupt state.
To run multiple identree replicas behind a load balancer, set IDENTREE_DATABASE_DRIVER=postgres and IDENTREE_DATABASE_DSN to a libpq URL. All replicas share state through Postgres, and dashboard SSE events plus admin-session revocations fan out across replicas via Postgres LISTEN/NOTIFY.
No sticky sessions are required. Any replica can serve any request.
identree handles thousands of concurrent challenges on either backend. The bottleneck in practice is the OIDC provider (token exchange latency) rather than identree itself. Monitor identree_oidc_exchange_duration_seconds to detect IdP slowdowns.
In full mode, identree polls the PocketID API every IDENTREE_LDAP_REFRESH_INTERVAL (default 300 seconds / 5 minutes) to sync users and groups. For large directories (1000+ users):
- Increase the interval if the PocketID API is under load. Set
IDENTREE_LDAP_REFRESH_INTERVAL=600sor higher. - Use webhooks for near-instant sync. Point a PocketID webhook at
https://identree.example.com/api/webhook/pocketidwithIDENTREE_WEBHOOK_SECRETset. This triggers an immediate refresh on user/group changes, letting you use a longer polling interval as a fallback.
identree applies internal rate limiting to challenge creation to prevent abuse. Rejected requests are counted in identree_rate_limit_rejections_total. If legitimate users are being rate-limited, check for:
- Misconfigured PAM on a host retrying in a tight loop
- Automated scripts running
sudorepeatedly - A compromised host flooding the server
Review this list before going to production.
-
IDENTREE_SHARED_SECRETis 32+ random bytes Generate withopenssl rand -hex 32. This secret authenticates every PAM client request. A weak or leaked secret means any network host can create and approve challenges. -
IDENTREE_EXTERNAL_URLuses HTTPS All OIDC flows, approval URLs, and API calls use this URL. HTTP in production exposes tokens and session cookies. -
IDENTREE_WEBHOOK_SECRETis set Without this, anyone who can reach/api/webhook/pocketidcan trigger LDAP refreshes. With a flood of requests, this becomes a denial-of-service vector. -
IDENTREE_ESCROW_HKDF_SALTis set Generate withopenssl rand -hex 32. This salt diversifies the encryption key for the local escrow backend per deployment. Without it, a static legacy salt is used (and a warning is logged at startup). Two deployments with the sameESCROW_ENCRYPTION_KEYand no salt produce identical ciphertexts. -
IDENTREE_METRICS_TOKENis set Without this,/metricsis unauthenticated. Prometheus metrics expose challenge counts, host counts, failure rates, and OIDC latency -- useful reconnaissance for an attacker. -
Break-glass passwords are escrowed (not just local hash) Without escrow, the break-glass password exists only as a bcrypt hash on the managed host. You can verify it works but cannot recover the plaintext if a user needs emergency access. Configure an escrow backend (
local,vault,1password-connect,bitwarden, orinfisical). -
State files (
/config/) are on a persistent volume with restricted permissionssessions.jsoncontains active session data.uidmap.jsoncontains UID mappings. Mount/configon a volume accessible only to the identree container user (UID/GIDidentree). -
Reverse proxy strips
X-Forwarded-*from untrusted clients identree trustsX-Forwarded-Forfor audit logging. If your proxy does not strip these headers from inbound requests, an attacker can spoof their source IP in audit logs. -
LDAP bind credentials are set (if LDAP is network-exposed) Set
IDENTREE_LDAP_BIND_DNandIDENTREE_LDAP_BIND_PASSWORDto require authentication for LDAP queries. Without these (and withIDENTREE_LDAP_ALLOW_ANONYMOUS=true), anyone who can reach port 389 can enumerate your entire user directory. -
Set independent secrets (
IDENTREE_SESSION_SECRET,IDENTREE_ESCROW_SECRET,IDENTREE_LDAP_SECRET) Split secrets limit blast radius if one is compromised. Each defaults toIDENTREE_SHARED_SECRETwhen unset, but production deployments should set all three independently. -
IDENTREE_OIDC_CLIENT_SECRETis kept out of version control Use environment variables or a secrets manager. Never commit OIDC credentials to a repository. -
Auditd monitoring rules are installed on managed hosts The install script installs auditd rules automatically if auditd is present. These rules create a kernel-level audit trail for break-glass hash reads, config file changes, PAM bypass attempts, and mTLS key exfiltration. Verify with
auditctl -l | grep identree. Forward audit logs off-host for tamper resistance. See auditd.md for details. -
Use a dedicated signing key for install scripts (not auto-generated) The auto-generated keypair is convenient for development but lives on the server. In production, generate a keypair offline and keep the private key on a trusted workstation. Configure
IDENTREE_INSTALL_SIGNING_KEYandIDENTREE_INSTALL_VERIFY_KEYto point to your keys. See install-scripts.md for the full production flow. -
Distribute the install verification public key out-of-band (bake into host images) Do not fetch the public key from the server at install time (TOFU). Instead, bake it into your base images, distribute it via configuration management (Ansible, Puppet, Chef), or include it in your provisioning pipeline. This ensures verification does not depend on the server's integrity.
-
Verify install script signatures before execution on all new hosts Before running the install script on any host, verify its detached Ed25519 signature:
curl -sf https://identree.example.com/install.sh -o /tmp/install.sh curl -sf https://identree.example.com/install.sh.sig -o /tmp/install.sh.sig identree verify-install --key /path/to/install-verify.pub --script /tmp/install.sh --sig /tmp/install.sh.sig sudo IDENTREE_SHARED_SECRET=xxx bash /tmp/install.sh https://identree.example.com
A non-zero exit code from
verify-installmeans the script has been tampered with. Do not execute it. See install-scripts.md for architecture details and custom script support.
When LDAPS (LDAP over TLS with mTLS) is enabled, the most common issues involve certificate trust and handshake failures. This section covers diagnosis and resolution.
If clients cannot connect to port 636, the TLS handshake is failing. Common causes:
- Client certificate not presented. The client must send a certificate signed by the CA configured in
IDENTREE_LDAP_TLS_CA_CERT. Verify with:LDAPTLS_CERT=/etc/identree/client.crt \ LDAPTLS_KEY=/etc/identree/client.key \ LDAPTLS_CACERT=/etc/identree/ca.crt \ ldapsearch -H ldaps://identree.example.com:636 -b "dc=example,dc=com" -D "cn=readonly,dc=example,dc=com" -w secret "(objectClass=*)"
- Wrong CA. The client certificate must be signed by the exact CA the server is configured with. Check with:
openssl verify -CAfile /etc/identree/ca.crt /etc/identree/client.crt
- Server certificate hostname mismatch. If the server's TLS certificate does not include the hostname the client connects to, the handshake will fail. Add
-d 1toldapsearchfor verbose TLS output.
Client certificates have a configurable TTL (default 1 year, set via IDENTREE_MTLS_CERT_TTL). When a certificate expires, the client will be rejected during the TLS handshake.
Check certificate expiry:
openssl x509 -in /etc/identree/client.crt -noout -enddateTo re-provision a host with a fresh certificate, re-run the install script or call the /api/client/provision endpoint.
- Ensure the CA certificate on the client (
ldap_tls_cacertin sssd.conf orLDAPTLS_CACERT) matches the CA configured on the server (IDENTREE_LDAP_TLS_CA_CERT). - If you rotated the CA, all existing client certificates become untrusted. Re-provision all hosts after a CA rotation.
- On some distributions, sssd caches TLS state. Restart sssd after changing certificate files:
sudo systemctl restart sssd
A quick end-to-end test using ldapsearch and LDAPTLS_* environment variables:
# Test plaintext LDAP (port 389)
ldapsearch -H ldap://identree.example.com:389 -b "dc=example,dc=com" -D "cn=readonly,dc=example,dc=com" -w secret "(uid=*)"
# Test LDAPS with mTLS (port 636)
LDAPTLS_CERT=/etc/identree/client.crt \
LDAPTLS_KEY=/etc/identree/client.key \
LDAPTLS_CACERT=/etc/identree/ca.crt \
ldapsearch -H ldaps://identree.example.com:636 -b "dc=example,dc=com" -D "cn=readonly,dc=example,dc=com" -w secret "(uid=*)"If the LDAPS test fails but plaintext works, the issue is in the TLS/mTLS configuration.
Monitor client certificate expiry proactively to avoid outages:
- Prometheus alert: If your clients report certificate metadata, alert when any certificate is within 30 days of expiry.
- Cron job on each host: Schedule a periodic check and alert:
# Alert if certificate expires within 30 days if openssl x509 -in /etc/identree/client.crt -noout -checkend 2592000 2>/dev/null; then : # OK else echo "WARN: identree client certificate expires within 30 days" | logger -t identree-cert fi
- Centralized monitoring: Use the identree audit log to track
provisionevents and calculate when certificates were last issued. Certificates older thanIDENTREE_MTLS_CERT_TTLminus a safety margin need rotation.
- Admin configuration panel help descriptions are English-only. All other user-facing strings are translated.