This document describes the full architecture of the encrypted P2P chat application. It implements a subset of the Signal Protocol (X3DH + Double Ratchet) for end-to-end encryption, with WebAuthn/Passkeys for passwordless authentication. The server is a relay that stores ciphertext blobs and never sees plaintext message content.
Everything that follows is grounded in the actual source code. File references use the format filename.py:line-range relative to the backend app/ or frontend src/ directory.
HTTPS (TLS)
┌────────────────────────────────────────────────────────────┐
│ Client Browser │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ SolidJS 1.9 + TypeScript 5.9 │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │ │
│ │ │ WebCrypto │ │ IndexedDB │ │ nanostores │ │ │
│ │ │ X25519 │ │ Private │ │ Reactive │ │ │
│ │ │ Ed25519 │ │ Keys │ │ State │ │ │
│ │ │ AES-256-GCM │ │ Ratchet │ │ │ │ │
│ │ │ HKDF-SHA256 │ │ States │ │ │ │ │
│ │ └──────────────┘ └──────────────┘ └────────────┘ │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
└──────────────────────────────┼──────────────────────────────┘
│
HTTPS + WSS │
│
┌──────────────────────────────┼──────────────────────────────┐
│ Nginx Reverse Proxy │ │
│ ┌───────────────────────────┴──────────────────────────┐ │
│ │ /api/* ──────► HTTP ──────► FastAPI :8000 │ │
│ │ /ws ──────► WS ──────► FastAPI :8000 │ │
│ │ /* ──────► Static files (SolidJS build) │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────┬──────────────────────────────┘
│
┌──────────────────────────────┼──────────────────────────────┐
│ FastAPI Backend │ │
│ (Python 3.13, ORJSONResponse, async/await throughout) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Routers: auth, rooms, encryption, websocket │ │
│ │ Services: auth, prekey, message, presence, websocket│ │
│ │ Core: x3dh, double_ratchet, passkey, surreal, redis │ │
│ └────────┬──────────────┬──────────────┬───────────────┘ │
│ │ │ │ │
└───────────┼──────────────┼──────────────┼───────────────────┘
│ │ │
asyncpg │ WebSocket │ redis. │
(TCP 5432) │ (WS 8000) │ asyncio │
│ │ (TCP 6379) │
┌───────────┴──┐ ┌───────┴──────┐ ┌───┴────────────┐
│ PostgreSQL │ │ SurrealDB │ │ Redis 8 │
│ 16-alpine │ │ (latest) │ │ (alpine) │
│ │ │ │ │ │
│ Auth data │ │ Messages │ │ Challenges │
│ Credentials │ │ Presence │ │ Rate limits │
│ X3DH keys │ │ Rooms │ │ │
│ Ratchet │ │ Live │ │ TTL-based │
│ states │ │ queries │ │ auto-expiry │
└──────────────┘ └──────────────┘ └────────────────┘
A few things worth noting about this diagram:
-
All three databases serve distinct purposes. PostgreSQL handles relational data that requires ACID transactions (users, credentials, cryptographic keys). SurrealDB handles real-time data that benefits from live query push notifications (messages, presence). Redis handles ephemeral data that needs automatic expiry (WebAuthn challenges, rate limit counters).
-
The WebSocket connection between client and FastAPI is the primary channel for real-time messaging. HTTP is used only for auth flows and key management.
-
The client does all encryption and decryption. The server receives ciphertext and stores it as-is. This is the central design constraint that everything else follows from.
Purpose: API server that handles authentication, key management, message relay, and WebSocket connections.
Key File: factory.py
The application uses the factory pattern. create_app() at lines 63-115 builds the FastAPI instance:
create_app() (factory.py:63-115)
├── FastAPI(
│ title = settings.APP_NAME,
│ default_response_class = ORJSONResponse,
│ lifespan = lifespan,
│ )
├── CORSMiddleware (factory.py:78-85)
├── GZipMiddleware, minimum_size=1000 (factory.py:87)
├── register_exception_handlers (factory.py:89)
├── Root endpoint "/" (factory.py:91-101)
├── Health endpoint "/health" (factory.py:103-108)
└── Routers:
├── auth_router → /auth/*
├── rooms_router → /rooms/*
├── encryption_router → /encryption/*
└── websocket_router → /ws
The lifespan manager at lines 39-61 runs database connections on startup and disconnections on shutdown:
lifespan(app) (factory.py:39-61)
Startup:
1. await init_db() → PostgreSQL tables via SQLModel
2. await redis_manager.connect() → Redis connection pool
3. await surreal_db.connect() → SurrealDB WebSocket connection
Shutdown:
1. await redis_manager.disconnect()
2. await surreal_db.disconnect()
The startup order matters. PostgreSQL is initialized first because the auth tables must exist before any request processing. Redis and SurrealDB follow because they depend on having a running application context. If any of these fail, the application does not start.
Interfaces:
| Route Prefix | Purpose | Key Endpoints |
|---|---|---|
/auth |
WebAuthn registration + login | register/begin, register/complete, authenticate/begin, authenticate/complete, users/search |
/encryption |
X3DH key management | prekey-bundle/{id}, upload-keys/{id}, initialize-keys/{id}, rotate-signed-prekey/{id}, opk-count/{id} |
/rooms |
Chat room CRUD | Room creation, listing, participant management |
/ws |
WebSocket endpoint | Real-time messaging, typing, presence, receipts |
Purpose: Real-time bidirectional communication for messaging, typing indicators, presence updates, and read receipts.
Key Files: core/websocket_manager.py, api/websocket.py, services/websocket_service.py
The ConnectionManager class (websocket_manager.py:31-296) is the center of the real-time system. It maintains three dictionaries:
self.active_connections: dict[UUID, list[WebSocket]] = {} # line 39
self.live_query_ids: dict[UUID, str] = {} # line 40
self.heartbeat_tasks: dict[UUID, asyncio.Task] = {} # line 41The first maps user IDs to lists of WebSocket connections. Each user can have up to 5 simultaneous connections (WS_MAX_CONNECTIONS_PER_USER = 5, config.py:141). This supports multi-device usage without exhausting server resources. When a sixth connection attempt arrives, the manager sends an error and closes the socket (websocket_manager.py:52-64).
Connection lifecycle:
connect(websocket, user_id) (websocket_manager.py:43-95)
1. Accept WebSocket
2. Check connection limit (max 5 per user)
3. Add to active_connections pool
4. Set user online via presence_service
5. Start heartbeat loop (30s interval)
6. Subscribe to SurrealDB live query for user's messages
7. Return True
disconnect(websocket, user_id) (websocket_manager.py:97-133)
1. Remove WebSocket from user's connection list
2. If last connection:
a. Set user offline via presence_service
b. Kill SurrealDB live query
c. Cancel heartbeat task
Heartbeat: The _heartbeat_loop method (websocket_manager.py:177-201) sends a ping every WS_HEARTBEAT_INTERVAL seconds (30s, from config.py:140). If the send fails, the connection is marked dead and disconnected. This catches silently dropped connections that TCP keepalive alone would miss.
Live query subscription: When a user connects, _subscribe_to_messages (websocket_manager.py:203-223) registers a SurrealDB live query that watches for new messages where the user is the recipient. SurrealDB pushes new messages to the server in real-time through this subscription. The callback wraps updates and dispatches them through _handle_live_message (websocket_manager.py:225-251), which forwards the encrypted payload to all of the user's active WebSocket connections.
Message routing: The WebSocket endpoint (websocket.py:25-84) is thin. It accepts connections, delegates to connection_manager.connect(), and then sits in a loop reading JSON messages. Each message gets routed by websocket_service.route_message(), which dispatches based on the type field:
route_message(websocket, user_id, message) (websocket_service.py:40-85)
├── "encrypted_message" → handle_encrypted_message()
├── "typing" → handle_typing_indicator()
├── "presence" → handle_presence_update()
├── "receipt" → handle_read_receipt()
├── "heartbeat" → handle_heartbeat()
└── unknown → send error response
Dead connection cleanup: When send_message() (websocket_manager.py:135-153) fails to send to a connection, it collects the dead connection and calls disconnect() on it. This prevents stale connections from accumulating.
Purpose: Implements the Signal Protocol cryptographic primitives for end-to-end encryption.
Key Files: core/encryption/x3dh_manager.py, core/encryption/double_ratchet.py, frontend/src/crypto/primitives.ts
The encryption system has two layers:
-
X3DH (Extended Triple Diffie-Hellman): Establishes a shared secret between two users who have never communicated before, even if one of them is offline.
-
Double Ratchet: Uses the X3DH shared secret to derive per-message encryption keys with forward secrecy. Every message gets a unique key, and compromising one key does not reveal past or future messages.
X3DHManager (x3dh_manager.py:56-353) handles:
- Key generation for identity keys (X25519 for DH, Ed25519 for signing)
- Signed prekey generation with Ed25519 signatures (
x3dh_manager.py:116-152) - One-time prekey generation (
x3dh_manager.py:154-174) - Signed prekey verification (
x3dh_manager.py:176-206) - Sender-side X3DH exchange (
x3dh_manager.py:208-281) - Receiver-side X3DH exchange (
x3dh_manager.py:283-350)
The X3DH exchange on the sender side (perform_x3dh_sender, lines 208-281) works like this:
Alice wants to message Bob (who might be offline):
1. Alice fetches Bob's prekey bundle from server:
- Bob's identity key (IK_B)
- Bob's signed prekey (SPK_B) + signature
- Bob's one-time prekey (OPK_B), if available
2. Alice verifies SPK_B signature using Bob's Ed25519 identity key
3. Alice generates ephemeral keypair (EK_A)
4. Four DH operations:
DH1 = X25519(IK_A_private, SPK_B) # Alice identity x Bob signed prekey
DH2 = X25519(EK_A_private, IK_B) # Alice ephemeral x Bob identity
DH3 = X25519(EK_A_private, SPK_B) # Alice ephemeral x Bob signed prekey
DH4 = X25519(EK_A_private, OPK_B) # Alice ephemeral x Bob one-time prekey
# (only if OPK available)
5. Concatenate: key_material = DH1 || DH2 || DH3 [|| DH4]
6. Derive shared key via HKDF-SHA256:
shared_key = HKDF(
salt = 0x00 * 32,
ikm = 0xFF * 32 || key_material,
info = "X3DH",
length = 32
)
7. Return: shared_key, associated_data (IK_A_pub || IK_B_pub), EK_A_pub
The 0xFF * 32 prefix before the key material is a domain separator. This matches the Signal specification and prevents potential cross-protocol attacks.
DoubleRatchet (double_ratchet.py:64-419) handles:
- Sender initialization from X3DH shared secret (
double_ratchet.py:279-302) - Receiver initialization (
double_ratchet.py:304-321) - Message encryption with chain key advancement (
double_ratchet.py:323-362) - Message decryption with out-of-order support (
double_ratchet.py:364-416) - DH ratchet steps for forward secrecy (
double_ratchet.py:155-213) - Skipped message key storage for out-of-order delivery (
double_ratchet.py:215-258)
The ratchet uses two KDF chains:
Root Key Chain:
_kdf_rk(root_key, dh_output) → (new_root_key, new_chain_key)
Uses HKDF-SHA256 with root_key as salt (double_ratchet.py:79-94)
Symmetric Key Chain:
_kdf_ck(chain_key) → (next_chain_key, message_key)
Uses HMAC-SHA256 with 0x01 for chain key, 0x02 for message key
(double_ratchet.py:96-109)
Each message is encrypted with AES-256-GCM using a 12-byte random nonce (double_ratchet.py:111-130). The message key is derived from the sending chain and used exactly once. After encryption, the sending chain advances, and the old message key is discarded.
Security limits from config.py:
MAX_SKIP_MESSAGE_KEYS = 1000(line 73): Maximum messages that can arrive out of order before the protocol rejects themMAX_CACHED_MESSAGE_KEYS = 2000(line 74): Maximum stored skipped keys before evictionAES_GCM_NONCE_SIZE = 12(line 69): 96-bit nonces for AES-GCMHKDF_OUTPUT_SIZE = 32(line 70): 256-bit derived keys
Client-side crypto: The frontend primitives.ts (lines 1-397) mirrors the backend crypto using the WebCrypto API. It provides:
generateX25519KeyPair()(line 15)x25519DeriveSharedSecret()(line 26)generateEd25519KeyPair()(line 89)ed25519Sign()/ed25519Verify()(lines 99, 113)hkdfDerive()/hkdfDeriveKey()(lines 166, 194)aesGcmEncrypt()/aesGcmDecrypt()(lines 224, 261)hmacSha256()/hmacSha256Verify()(lines 310, 329)constantTimeEqual()(line 388): Constant-time comparison to prevent timing side channels
The client also has crypto-service.ts, double-ratchet.ts, x3dh.ts, key-store.ts, and message-store.ts which orchestrate these primitives into the full protocol flows.
Server-side vs client-side encryption paths:
The MessageService (message_service.py) has two paths:
store_encrypted_message()(lines 269-314): Client-side passthrough. The server receives ciphertext, nonce, and header as strings and stores them as-is in SurrealDB. No decryption or re-encryption on the server.send_encrypted_message()(lines 316-402): Server-side encryption. Marked[DEPRECATED]in the docstring (line 325). This path loads the ratchet state from PostgreSQL, encrypts on the server, and stores in SurrealDB. It exists for backwards compatibility during migration to full client-side encryption.
The client-side path is the intended production path. In this path, the server literally cannot read messages because it never has the keys.
Purpose: Passwordless authentication using WebAuthn/FIDO2 passkeys.
Key Files: core/passkey/passkey_manager.py, api/auth.py, services/auth_service.py, core/redis_manager.py
The PasskeyManager (passkey_manager.py:43-210) wraps the py_webauthn library. It is configured with the Relying Party (RP) settings from config.py:
self.rp_id = settings.RP_ID # e.g., "localhost" or "chat.example.com"
self.rp_name = settings.RP_NAME # "Encrypted P2P Chat"
self.rp_origin = settings.RP_ORIGIN # "https://chat.example.com"Registration flow:
Client Server Redis
│ │ │
│ POST /auth/register/begin│ │
│ { username, display_name }│ │
│ ─────────────────────────►│ │
│ │ │
│ PasskeyManager.generate_registration_options()
│ (passkey_manager.py:55-94) │
│ │ │
│ │ SET webauthn:reg_challenge:{username}
│ │ challenge_bytes, TTL=600s│
│ │ ──────────────────────────►
│ │ │
│ ◄─── registration options (publicKey config) ────────│
│ │ │
│ Browser WebAuthn API │ │
│ navigator.credentials │ │
│ .create(options) │ │
│ User touches │ │
│ authenticator │ │
│ │ │
│ POST /auth/register/complete │
│ { credential, username } │ │
│ ─────────────────────────►│ │
│ │ │
│ │ GET+DEL webauthn:reg_challenge:{username}
│ │ ──────────────────────────►
│ │ ◄── challenge_bytes ──────│
│ │ │
│ PasskeyManager.verify_registration() │
│ (passkey_manager.py:96-130) │
│ │ │
│ Create User in PostgreSQL │
│ Store Credential in PostgreSQL │
│ Initialize X3DH keys │
│ (prekey_service.py:152-219) │
│ │ │
│ ◄─── UserResponse (id, username, etc.) ──────────────│
Authentication flow:
The authentication flow is similar but uses generate_authentication_options and verify_authentication. A critical detail is the signature counter check (passkey_manager.py:184-193): if the new counter is not greater than the stored counter, it raises a ValueError indicating a potentially cloned authenticator. This is the WebAuthn clone detection mechanism.
Challenge storage: Redis stores challenges with a 600-second TTL (WEBAUTHN_CHALLENGE_TTL_SECONDS = 600, config.py:84). The get_registration_challenge and get_authentication_challenge methods use Redis pipelines to atomically GET and DELETE the challenge (redis_manager.py:86-95). This ensures each challenge is used exactly once.
The challenge itself is 32 random bytes (WEBAUTHN_CHALLENGE_BYTES = 32, config.py:85), stored as hex in Redis.
Auth endpoints (auth.py:31-103):
| Endpoint | Method | Status | Description |
|---|---|---|---|
/auth/register/begin |
POST | 200 | Generate WebAuthn registration options |
/auth/register/complete |
POST | 201 | Verify credential, create user + keys |
/auth/authenticate/begin |
POST | 200 | Generate WebAuthn authentication options |
/auth/authenticate/complete |
POST | 200 | Verify credential, update counter |
/auth/users/search |
POST | 200 | Search users by username/display name |
Purpose: Persistent storage split across three purpose-built databases.
PostgreSQL via SQLModel/SQLAlchemy async: Handles all relational data with ACID guarantees.
The engine is configured in models/Base.py:38-44:
engine = create_async_engine(
str(settings.DATABASE_URL), # postgresql+asyncpg://...
echo = settings.DEBUG, # SQL logging in development
pool_size = settings.DB_POOL_SIZE, # 20 (config.py:118)
max_overflow = settings.DB_MAX_OVERFLOW, # 40 (config.py:119)
pool_pre_ping = True, # Detect stale connections
)The session factory (Base.py:47-51) creates AsyncSession instances with expire_on_commit=False so that objects remain usable after commit without requiring a refresh.
SurrealDB via AsyncSurreal: Handles real-time messaging data. The SurrealDBManager (surreal_manager.py:28-428) connects over WebSocket to SurrealDB and provides methods for message CRUD, room management, presence tracking, and live query subscriptions. The live query feature is the primary reason SurrealDB was chosen: it pushes new records to subscribers in real-time, eliminating the need for polling.
Redis via redis.asyncio: Handles ephemeral data. The RedisManager (redis_manager.py:19-174) uses a connection pool of 50 connections (redis_manager.py:39) and stores challenges as hex-encoded bytes with TTL-based expiry.
This is the most important flow in the system. Here is what happens step by step when Alice sends a message to Bob, assuming they already have an established Double Ratchet session:
Alice's Browser Server Bob's Browser
│ │ │
┌────┴────────────────┐ │ │
│ 1. User types msg │ │ │
│ in ChatInput.tsx │ │ │
└────┬────────────────┘ │ │
│ │ │
┌────┴────────────────┐ │ │
│ 2. crypto-service.ts│ │ │
│ encrypts: │ │ │
│ a. Advance send │ │ │
│ chain via │ │ │
│ HMAC-SHA256 │ │ │
│ b. Derive unique │ │ │
│ message key │ │ │
│ c. AES-256-GCM │ │ │
│ encrypt with │ │ │
│ random nonce │ │ │
│ d. Build header: │ │ │
│ {dh_pub_key, │ │ │
│ msg_number, │ │ │
│ prev_chain} │ │ │
└────┬────────────────┘ │ │
│ │ │
│ 3. WebSocket send: │ │
│ { │ │
│ type: "encrypted_message", │
│ recipient_id: bob_uuid, │ │
│ room_id: room_uuid, │ │
│ ciphertext: "base64...",│ │
│ nonce: "base64...", │ │
│ header: "{...json...}", │ │
│ temp_id: "client_123" │ │
│ } │ │
│ ───────────────────────────► │
│ │ │
│ ┌────────────┴───────────────┐ │
│ │ 4. websocket.py:46-54 │ │
│ │ receives JSON, parses │ │
│ │ routes via │ │
│ │ websocket_service │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌────────────┴───────────────┐ │
│ │ 5. websocket_service.py: │ │
│ │ handle_encrypted_message│ │
│ │ (lines 87-179) │ │
│ │ Extracts fields, │ │
│ │ opens DB session │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌────────────┴───────────────┐ │
│ │ 6. message_service.py: │ │
│ │ store_encrypted_message │ │
│ │ (lines 269-314) │ │
│ │ Looks up sender user │ │
│ │ for username. │ │
│ │ Stores ciphertext, │ │
│ │ nonce, header AS-IS in │ │
│ │ SurrealDB. │ │
│ │ NO DECRYPTION. │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌────────────┴───────────────┐ │
│ │ 7. SurrealDB live query │ │
│ │ fires for Bob because │ │
│ │ recipient_id matches │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌────────────┴───────────────┐ │
│ │ 8. websocket_manager.py: │ │
│ │ _handle_live_message │ │
│ │ (lines 225-251) │ │
│ │ Wraps as EncryptedMsgWS │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌────────────┴───────────────┐ │
│ │ 9. send_message(bob_uuid) │ │
│ │ (lines 135-153) │ │
│ │ Sends to ALL of Bob's │ │
│ │ active WebSockets │ │
│ └────────────┬───────────────┘ │
│ │ ─────────────────────────────►
│ │ │
│ │ ┌────────────────┴──┐
│ │ │ 10. Bob's crypto- │
│ │ │ service.ts │
│ │ │ decrypts: │
│ │ │ a. Check for │
│ │ │ skipped │
│ │ │ msg keys │
│ │ │ b. If new DH │
│ │ │ pub key, │
│ │ │ DH ratchet │
│ │ │ step │
│ │ │ c. Advance │
│ │ │ recv chain │
│ │ │ d. Derive msg │
│ │ │ key │
│ │ │ e. AES-256-GCM│
│ │ │ decrypt │
│ │ └────────┬─────────┘
│ │ │
│ │ ┌────────┴─────────┐
│ │ │ 11. Plaintext │
│ │ │ rendered in │
│ │ │ MessageList │
│ │ └──────────────────┘
│ │ │
│ ◄── confirmation ─────────│ │
│ { type: "message_sent", │ │
│ temp_id: "client_123", │ │
│ status: "sent" } │ │
The server never sees the plaintext. It acts purely as a relay that stores and forwards ciphertext blobs. The store_encrypted_message method (message_service.py:269-314) literally just wraps the received fields into a dict and calls surreal_db.create_message().
The confirmation message sent back to Alice (websocket_service.py:148-159) includes the temp_id so the client can match it to the optimistically rendered message in the UI.
This flow happens once per user.
Step 1: POST /auth/register/begin
auth_service.py:331-374
→ PasskeyManager generates challenge
→ Challenge stored in Redis with 600s TTL
→ Registration options returned to client
Step 2: Browser WebAuthn API (navigator.credentials.create)
→ User interacts with authenticator (Touch ID, YubiKey, etc.)
→ Browser creates credential bound to RP origin
Step 3: POST /auth/register/complete
auth_service.py:376-437
→ Challenge retrieved from Redis (GET+DEL atomic)
→ PasskeyManager verifies credential against challenge
→ User record created in PostgreSQL
→ Credential record created in PostgreSQL
→ prekey_service.initialize_user_keys() called
(prekey_service.py:152-219)
Step 4: Server-side key initialization (prekey_service.py:152-219)
→ Generate X25519 identity keypair (IK)
→ Generate Ed25519 signing keypair
→ Store both public+private in PostgreSQL identity_keys table
→ Generate signed prekey (SPK) signed with Ed25519 IK
→ Generate 100 one-time prekeys (OPKs)
→ Store all in PostgreSQL
Step 5: (Client-side, post-registration)
Client generates its own X3DH keys using WebCrypto:
→ X25519 identity keypair
→ Ed25519 signing keypair
→ Signed prekey with signature
→ 100 one-time prekeys
Step 6: POST /encryption/upload-keys/{user_id}
encryption.py:72-95
→ prekey_service.store_client_keys()
(prekey_service.py:45-150)
→ Only PUBLIC keys stored on server
→ Private keys remain in browser IndexedDB
Note there is a dual path here. Step 4 is the server-side key generation (used as a fallback and for backwards compatibility). Step 5-6 is the client-side key generation path (the preferred production path). In the client-side path, the server never sees private keys.
When Alice wants to message Bob for the first time:
Step 1: Alice's client requests Bob's prekey bundle
GET /encryption/prekey-bundle/{bob_id}
encryption.py:32-51
Step 2: prekey_service.get_prekey_bundle(session, bob_id)
prekey_service.py:293-361
→ Fetch Bob's identity key (IK)
→ Fetch Bob's active signed prekey (SPK)
→ If no active SPK, auto-rotate (prekey_service.py:321)
→ Fetch one unused one-time prekey (OPK)
→ Mark OPK as used (single-use guarantee, line 332)
→ If unused OPK count < 20, auto-replenish 100 more
(encryption.py:47-49)
→ Return PreKeyBundle{IK, IK_ed25519, SPK, SPK_sig, OPK}
Step 3: Alice performs X3DH sender-side locally
Using the prekey bundle and her own identity key:
a. Verify SPK signature with Bob's Ed25519 IK
b. Generate ephemeral keypair (EK)
c. Compute DH1..DH4
d. Derive shared_key via HKDF
Step 4: Initialize Double Ratchet with shared_key
Alice's client calls double_ratchet.initialize_sender()
with the shared key and Bob's SPK as the initial peer key
Step 5: First message includes X3DH header
The header contains Alice's ephemeral public key and
identity key reference so Bob can derive the same
shared secret when he comes online
Step 6: Bob receives the message
Bob's client uses the X3DH header + his own private keys
to perform the receiver-side X3DH exchange
(x3dh_manager.py:283-350)
Both parties now share the same root key for the
Double Ratchet
The OPK single-use guarantee is enforced at the database level. When get_prekey_bundle fetches an OPK, it immediately marks is_used = True and commits (prekey_service.py:332-345). If someone else requests the same OPK concurrently, they will get a different one or none.
Where: factory.py:63-115
Why: The create_app() function builds and returns a fully configured FastAPI instance. This separates app creation from app execution (main.py just calls create_app()). The practical benefit is testability: you can call create_app() with different configurations in tests without starting a server. The lifespan manager (factory.py:39-61) ensures all databases are connected before the first request and properly disconnected on shutdown.
Where: services/ directory (auth_service.py, prekey_service.py, message_service.py, presence_service.py, websocket_service.py)
Why: Business logic lives in service classes, not in API endpoint functions. The API layer (api/) is thin: it handles request validation and response formatting, then delegates to services. Services are stateless singletons instantiated at module level:
message_service = MessageService() # message_service.py:469
prekey_service = PrekeyService() # prekey_service.py:468
auth_service = AuthService() # auth_service.py:601
websocket_service = WebSocketService() # websocket_service.py:324This means any endpoint can import and call any service without worrying about instantiation or dependency injection. The tradeoff is that services cannot be easily swapped at runtime, but for this application that is not needed.
Where: websocket_manager.py:31-42
Why: The ConnectionManager maps user IDs to lists of WebSocket connections. This supports multiple simultaneous devices per user (up to 5, enforced at websocket_manager.py:52). When a message needs to be delivered, send_message() iterates over all connections for that user. Dead connections are detected during send attempts and cleaned up immediately.
Where: websocket_manager.py:203-224, surreal_manager.py:341-359
Why: SurrealDB live queries implement a push-based notification system. When a new message is created in SurrealDB with recipient_id = bob, SurrealDB pushes that record to the server through the live query callback. The server then forwards it to Bob's WebSocket connections. This eliminates polling entirely. The alternative would be the server polling SurrealDB for new messages, which would add latency proportional to the polling interval and waste resources when no messages are pending.
The subscription is per-user, not per-room. live_messages_for_user() (surreal_manager.py:341-359) watches WHERE recipient_id = '{user_id}', so the server receives all messages destined for that user regardless of room.
Where: Module-level instances at the bottom of each manager file:
x3dh_manager = X3DHManager() # x3dh_manager.py:353
double_ratchet = DoubleRatchet() # double_ratchet.py:419
passkey_manager = PasskeyManager() # passkey_manager.py:210
connection_manager = ConnectionManager() # websocket_manager.py:296
surreal_db = SurrealDBManager() # surreal_manager.py:428
redis_manager = RedisManager() # redis_manager.py:174Why: These managers hold no per-request state. X3DHManager and DoubleRatchet are pure functions wrapped in a class (they take state as arguments and return results). PasskeyManager holds only the RP configuration. The connection managers (surreal_db, redis_manager, connection_manager) hold shared connection pools. A single instance per process is the correct model.
┌─────────────────────────────────────────────────────────────────┐
│ API Layer (api/) │
│ ┌───────────┐ ┌──────────────┐ ┌─────────┐ ┌──────────────┐ │
│ │ auth.py │ │ encryption.py│ │ rooms.py│ │ websocket.py │ │
│ │ lines │ │ lines 1-127 │ │ │ │ lines 1-85 │ │
│ │ 1-104 │ │ │ │ │ │ │ │
│ └───────────┘ └──────────────┘ └─────────┘ └──────────────┘ │
│ HTTP/WS endpoints. Request validation. Response formatting. │
│ Thin wrappers that delegate to services. │
├─────────────────────────────────────────────────────────────────┤
│ Service Layer (services/) │
│ ┌──────────────┐ ┌───────────────┐ ┌─────────────────────┐ │
│ │ auth_service │ │ prekey_service│ │ message_service │ │
│ │ lines 1-601 │ │ lines 1-468 │ │ lines 1-469 │ │
│ └──────────────┘ └───────────────┘ └─────────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ presence_service │ │ websocket_service│ │
│ └──────────────────┘ └──────────────────┘ │
│ Business logic. Orchestration. Error handling. │
│ Stateless singletons. Import Core + Models. │
├─────────────────────────────────────────────────────────────────┤
│ Core Layer (core/) │
│ ┌──────────────┐ ┌───────────────┐ ┌─────────────────────┐ │
│ │ x3dh_manager │ │ double_ratchet│ │ passkey_manager │ │
│ │ lines 1-353 │ │ lines 1-419 │ │ lines 1-210 │ │
│ └──────────────┘ └───────────────┘ └─────────────────────┘ │
│ ┌──────────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ websocket_manager│ │ surreal_mgr │ │ redis_manager │ │
│ │ lines 1-296 │ │ lines 1-428 │ │ lines 1-174 │ │
│ └──────────────────┘ └──────────────┘ └──────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────────────────────┐ │
│ │ exceptions.py │ │ exception_handlers.py │ │
│ └──────────────────┘ └──────────────────────────────────┘ │
│ Protocol implementations. Database clients. WebSocket pool. │
│ No imports from API or Services. │
├─────────────────────────────────────────────────────────────────┤
│ Model Layer (models/) │
│ ┌──────┐ ┌────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ User │ │ Credential │ │ IdentityKey │ │ SignedPrekey │ │
│ └──────┘ └────────────┘ └─────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ OneTimePrekey│ │ RatchetState │ │ SkippedMessageKey │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ SQLModel ORM classes. Data structures. Validation. │
│ Import only config constants and Base. │
├─────────────────────────────────────────────────────────────────┤
│ Schema Layer (schemas/) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ auth.py │ │ common.py│ │ rooms.py │ │ websocket.py │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
│ ┌──────────────┐ │
│ │ surreal.py │ │
│ └──────────────┘ │
│ Pydantic request/response models. API contracts. │
│ Serialization. No business logic. │
└─────────────────────────────────────────────────────────────────┘
Import rules: API imports Services. Services import Core + Models. Core imports nothing from API or Services. Models import nothing except config constants and the Base class. Schemas import nothing except config constants.
The only exception is websocket_service.py, which imports connection_manager from Core and message_service from Services. This is acceptable because WebSocket message handling straddles both layers.
All PostgreSQL models inherit from BaseDBModel (models/Base.py:20-34), which provides created_at and updated_at timestamp fields with timezone awareness.
users (models/User.py:24-68)
┌────────────────────────────────────────────────────────────┐
│ users │
├──────────────────────┬──────────────────┬──────────────────┤
│ Column │ Type │ Constraints │
├──────────────────────┼──────────────────┼──────────────────┤
│ id │ UUID │ PK, default uuid4│
│ username │ VARCHAR(50) │ UNIQUE, INDEX │
│ display_name │ VARCHAR(100) │ NOT NULL │
│ is_active │ BOOLEAN │ default True │
│ is_verified │ BOOLEAN │ default False │
│ identity_key │ VARCHAR(500) │ nullable │
│ signed_prekey │ VARCHAR(500) │ nullable │
│ signed_prekey_sig │ VARCHAR(500) │ nullable │
│ one_time_prekeys │ TEXT │ nullable │
│ created_at │ TIMESTAMP(tz) │ NOT NULL │
│ updated_at │ TIMESTAMP(tz) │ NOT NULL │
├──────────────────────┴──────────────────┴──────────────────┤
│ Relationships: credentials (1:many → Credential) │
└────────────────────────────────────────────────────────────┘
credentials (models/Credential.py:27-78)
┌────────────────────────────────────────────────────────────┐
│ credentials │
├──────────────────────┬──────────────────┬──────────────────┤
│ Column │ Type │ Constraints │
├──────────────────────┼──────────────────┼──────────────────┤
│ id │ INTEGER │ PK, autoincrement│
│ credential_id │ VARCHAR(512) │ UNIQUE, INDEX │
│ public_key │ VARCHAR(1024) │ NOT NULL │
│ sign_count │ INTEGER │ default 0 │
│ aaguid │ VARCHAR(64) │ nullable │
│ backup_eligible │ BOOLEAN │ default False │
│ backup_state │ BOOLEAN │ default False │
│ attestation_type │ VARCHAR(50) │ nullable │
│ transports │ VARCHAR(200) │ nullable │
│ user_id │ UUID │ FK → users.id │
│ device_name │ VARCHAR(100) │ nullable │
│ last_used_at │ TIMESTAMP(tz) │ nullable │
│ created_at │ TIMESTAMP(tz) │ NOT NULL │
│ updated_at │ TIMESTAMP(tz) │ NOT NULL │
├──────────────────────┴──────────────────┴──────────────────┤
│ Relationships: user (many:1 → User) │
└────────────────────────────────────────────────────────────┘
identity_keys (models/IdentityKey.py:18-48)
┌────────────────────────────────────────────────────────────┐
│ identity_keys │
├──────────────────────┬──────────────────┬──────────────────┤
│ Column │ Type │ Constraints │
├──────────────────────┼──────────────────┼──────────────────┤
│ id │ INTEGER │ PK, autoincrement│
│ user_id │ UUID │ FK → users.id │
│ │ │ UNIQUE, INDEX │
│ public_key │ VARCHAR(64) │ NOT NULL (X25519)│
│ private_key │ VARCHAR(64) │ NOT NULL (X25519)│
│ public_key_ed25519 │ VARCHAR(64) │ NOT NULL │
│ private_key_ed25519 │ VARCHAR(64) │ NOT NULL │
│ created_at │ TIMESTAMP(tz) │ NOT NULL │
│ updated_at │ TIMESTAMP(tz) │ NOT NULL │
└────────────────────────────────────────────────────────────┘
Note: When client-side key generation is used (store_client_keys, prekey_service.py:45-150), the private_key and private_key_ed25519 fields are stored as empty strings. Only the public keys are actually stored on the server. The private key fields remain in the schema for backward compatibility with the server-side key generation path.
signed_prekeys (models/SignedPrekey.py:20-50)
┌────────────────────────────────────────────────────────────┐
│ signed_prekeys │
├──────────────────────┬──────────────────┬──────────────────┤
│ Column │ Type │ Constraints │
├──────────────────────┼──────────────────┼──────────────────┤
│ id │ INTEGER │ PK, autoincrement│
│ user_id │ UUID │ FK → users.id │
│ key_id │ INTEGER │ INDEX │
│ public_key │ VARCHAR(64) │ NOT NULL (X25519)│
│ private_key │ VARCHAR(64) │ NOT NULL │
│ signature │ VARCHAR(128) │ NOT NULL (Ed2551)│
│ is_active │ BOOLEAN │ default True │
│ expires_at │ TIMESTAMP(tz) │ nullable │
│ created_at │ TIMESTAMP(tz) │ NOT NULL │
│ updated_at │ TIMESTAMP(tz) │ NOT NULL │
└────────────────────────────────────────────────────────────┘
Rotation: New signed prekey every 48 hours (SIGNED_PREKEY_ROTATION_HOURS = 48, config.py:76). Old inactive prekeys retained for 7 days (SIGNED_PREKEY_RETENTION_DAYS = 7, config.py:77) then cleaned up by cleanup_old_signed_prekeys() (prekey_service.py:428-465).
one_time_prekeys (models/OneTimePrekey.py:18-45)
┌────────────────────────────────────────────────────────────┐
│ one_time_prekeys │
├──────────────────────┬──────────────────┬──────────────────┤
│ Column │ Type │ Constraints │
├──────────────────────┼──────────────────┼──────────────────┤
│ id │ INTEGER │ PK, autoincrement│
│ user_id │ UUID │ FK → users.id │
│ key_id │ INTEGER │ INDEX │
│ public_key │ VARCHAR(64) │ NOT NULL (X25519)│
│ private_key │ VARCHAR(64) │ NOT NULL │
│ is_used │ BOOLEAN │ default False │
│ │ │ INDEX │
│ created_at │ TIMESTAMP(tz) │ NOT NULL │
│ updated_at │ TIMESTAMP(tz) │ NOT NULL │
└────────────────────────────────────────────────────────────┘
Each user starts with 100 OPKs (DEFAULT_ONE_TIME_PREKEY_COUNT = 100, config.py:75). When the count drops below 20, the get_prekey_bundle endpoint auto-replenishes (encryption.py:47-49).
ratchet_states (models/RatchetState.py:18-68)
┌────────────────────────────────────────────────────────────┐
│ ratchet_states │
├────────────────────────────┬──────────────┬────────────────┤
│ Column │ Type │ Constraints │
├────────────────────────────┼──────────────┼────────────────┤
│ id │ INTEGER │ PK │
│ user_id │ UUID │ FK → users.id │
│ peer_user_id │ UUID │ FK → users.id │
│ dh_private_key │ VARCHAR(100K)│ nullable │
│ dh_public_key │ VARCHAR(100K)│ nullable │
│ dh_peer_public_key │ VARCHAR(100K)│ nullable │
│ root_key │ VARCHAR(100K)│ NOT NULL │
│ sending_chain_key │ VARCHAR(100K)│ NOT NULL │
│ receiving_chain_key │ VARCHAR(100K)│ NOT NULL │
│ sending_message_number │ INTEGER │ default 0 │
│ receiving_message_number │ INTEGER │ default 0 │
│ previous_sending_chain_len │ INTEGER │ default 0 │
│ created_at │ TIMESTAMP(tz)│ NOT NULL │
│ updated_at │ TIMESTAMP(tz)│ NOT NULL │
└────────────────────────────────────────────────────────────┘
There is one ratchet state per (user_id, peer_user_id) pair. The relationship is directional: Alice's ratchet state for talking to Bob is a separate row from Bob's ratchet state for talking to Alice.
skipped_message_keys (models/SkippedMessageKey.py:17-51)
┌────────────────────────────────────────────────────────────┐
│ skipped_message_keys │
├──────────────────────┬──────────────────┬──────────────────┤
│ Column │ Type │ Constraints │
├──────────────────────┼──────────────────┼──────────────────┤
│ id │ INTEGER │ PK, autoincrement│
│ ratchet_state_id │ INTEGER │ FK → ratchet_ │
│ │ │ states.id, INDEX │
│ dh_public_key │ VARCHAR(100000) │ NOT NULL, INDEX │
│ message_number │ INTEGER │ NOT NULL, INDEX │
│ message_key │ VARCHAR(100000) │ NOT NULL │
│ created_at │ TIMESTAMP(tz) │ NOT NULL │
│ updated_at │ TIMESTAMP(tz) │ NOT NULL │
└────────────────────────────────────────────────────────────┘
These store message keys for out-of-order delivery. When the receiving ratchet advances past a message number that has not been received yet, the key for that message is computed and stored here. When the skipped message eventually arrives, its key is looked up and consumed.
SurrealDB is schemaless, but these are the document structures the application creates:
messages (created via surreal_manager.py:112-122)
{
id: "messages:ulid_here", // SurrealDB auto-generated
sender_id: "uuid_string",
recipient_id: "uuid_string",
room_id: "rooms:ulid_here" | null,
ciphertext: "base64url_encoded_bytes",
nonce: "base64url_encoded_bytes",
header: "{\"dh_public_key\":\"...\",\"message_number\":0,...}",
sender_username: "alice",
created_at: "2026-01-15T10:30:00Z",
updated_at: "2026-01-15T10:30:00Z"
}
presence (created via surreal_manager.py:287-305)
{
id: "presence:`user_uuid`", // user-specific record ID
user_id: "uuid_string",
status: "online" | "away" | "offline",
last_seen: "2026-01-15T10:30:00Z",
updated_at: "time::now()"
}
rooms (created via surreal_manager.py:155-182)
{
id: "rooms:ulid_here",
name: "Room Name",
type: "direct" | "group",
members: ["uuid1", "uuid2"],
created_at: "2026-01-15T10:30:00Z",
updated_at: "2026-01-15T10:30:00Z"
}
room_participants (created via surreal_manager.py:184-221)
{
id: "room_participants:ulid_here",
room_id: "rooms:ulid_here",
user_id: "uuid_string",
role: "member" | "admin",
joined_at: "2026-01-15T10:30:00Z"
}
webauthn:reg_challenge:{username} → 32 bytes (hex-encoded), TTL 600s
webauthn:auth_challenge:{username} → 32 bytes (hex-encoded), TTL 600s
These are the only two key patterns currently in use. Both use the Redis pipeline GET+DELETE pattern (redis_manager.py:86-95) for atomic one-time consumption. The TTL ensures stale challenges are automatically cleaned up even if the client never completes the flow.
What the system protects against:
| Threat | Protection | How |
|---|---|---|
| Compromised server | E2E encryption | Server stores ciphertext, never has keys |
| Network eavesdropper | TLS + E2E | Even without TLS, messages are AES-256-GCM encrypted |
| Stolen database dump | Key separation | PostgreSQL has public keys only (client-side path). Private keys live in browser IndexedDB. OPKs are single-use. |
| Phishing / credential theft | WebAuthn | Credentials are origin-bound. Cannot be replayed on a different domain. |
| Replay attacks | Nonces + counters | AES-GCM nonces are random. Message numbers are sequential. WebAuthn challenges have TTL. |
| Authenticator cloning | Signature counter | passkey_manager.py:184-193 checks that the counter strictly increases. If it does not, authentication fails with a clone detection error. |
| Message tampering | AEAD | AES-256-GCM provides authenticated encryption. Tampered ciphertext fails the GCM tag check (double_ratchet.py:143-153). |
| Key compromise (single key) | Forward secrecy | Double Ratchet generates new DH keys regularly. Compromising one chain key reveals only future messages in that chain, not past messages. |
What is out of scope:
| Threat | Why |
|---|---|
| Compromised client device | If the attacker has access to the browser, they have access to IndexedDB (private keys) and can read plaintext. There is no defense against a fully compromised endpoint. |
| Side-channel attacks on crypto | The constantTimeEqual() function in primitives.ts:388-397 is the extent of timing attack mitigation. Comprehensive side-channel resistance would require constant-time implementations of all crypto primitives, which the WebCrypto API generally provides but does not guarantee. |
| Metadata analysis | The server knows who messages whom, when, how often, and message sizes. Only content is protected, not metadata. Protecting metadata would require something like mixnets or onion routing, which is not implemented. |
| Quantum computing | X25519 and Ed25519 are not post-quantum. A sufficiently powerful quantum computer could break them. Post-quantum key exchange (e.g., ML-KEM/Kyber) is not implemented. |
| Compromised authenticator supply chain | If the authenticator hardware itself is backdoored, WebAuthn cannot detect this. The AAGUID field can identify the authenticator model but not verify its integrity. |
Layer 1: Transport Security (TLS/HTTPS via Nginx)
│
│ Protects: Data in transit between client and server
│ Mechanism: TLS certificate, HTTPS enforcement
│ Configuration: Nginx reverse proxy with SSL termination
│
▼
Layer 2: Authentication (WebAuthn/Passkeys)
│
│ Protects: Identity verification, prevents impersonation
│ Mechanism: Public-key cryptography, hardware-bound keys
│ Key files: passkey_manager.py:43-210, auth_service.py:331-598
│ Redis: Challenge storage with 600s TTL, one-time consumption
│
▼
Layer 3: Key Exchange (X3DH Protocol)
│
│ Protects: Initial shared secret establishment
│ Mechanism: 3-4 Diffie-Hellman operations + HKDF
│ Key files: x3dh_manager.py:208-350, prekey_service.py:293-361
│ Properties: Asynchronous (works even if recipient offline)
│ Deniable (either party could have forged the exchange)
│
▼
Layer 4: Message Encryption (Double Ratchet + AES-256-GCM)
│
│ Protects: Message confidentiality and integrity
│ Mechanism: Symmetric ratchet (HMAC chains) + DH ratchet
│ Key files: double_ratchet.py:64-419, primitives.ts:1-397
│ Properties: Forward secrecy per-message
│ Future secrecy (self-healing after compromise)
│
▼
Layer 5: Key Lifecycle Management
│
│ Protects: Limits blast radius of any single key compromise
│ Mechanism: SPK rotation every 48h, OPK single-use, old SPK cleanup
│ Key files: prekey_service.py:221-291 (rotation),
│ prekey_service.py:428-465 (cleanup),
│ prekey_service.py:363-407 (replenishment)
│ Constants: SIGNED_PREKEY_ROTATION_HOURS=48 (config.py:76)
│ SIGNED_PREKEY_RETENTION_DAYS=7 (config.py:77)
│ DEFAULT_ONE_TIME_PREKEY_COUNT=100 (config.py:75)
│
▼
Layer 6: Rate Limiting and Abuse Prevention
│
Protects: Against brute force and DoS
Mechanism: Per-user message rate limits, auth attempt limits
Constants: RATE_LIMIT_MESSAGES_PER_MINUTE=60 (config.py:146)
RATE_LIMIT_AUTH_ATTEMPTS=5 (config.py:147)
WS_MAX_CONNECTIONS_PER_USER=5 (config.py:141)
All configuration lives in config.py. The Settings class (lines 96-218) inherits from pydantic_settings.BaseSettings and loads values from environment variables and the .env file.
| Variable | Default | Description |
|---|---|---|
ENV |
"development" |
development, production, or testing |
DEBUG |
True |
Enables SQL echo logging, docs endpoints |
APP_NAME |
"encrypted-p2p-chat" |
Application name in metadata |
SECRET_KEY |
(required) | Application secret, no default |
| Variable | Default | Description |
|---|---|---|
POSTGRES_HOST |
"localhost" |
Database host |
POSTGRES_PORT |
5432 |
Database port |
POSTGRES_DB |
"chat_auth" |
Database name |
POSTGRES_USER |
"chat_user" |
Database user |
POSTGRES_PASSWORD |
"" |
Database password |
DATABASE_URL |
(auto-built) | Full connection string, assembled by field_validator at line 149 |
DB_POOL_SIZE |
20 |
SQLAlchemy connection pool size |
DB_MAX_OVERFLOW |
40 |
Pool overflow connections |
The assemble_db_connection validator (config.py:149-161) builds the URL from components if DATABASE_URL is not explicitly set. The URL uses the postgresql+asyncpg:// scheme for async connections.
| Variable | Default | Description |
|---|---|---|
SURREAL_HOST |
"localhost" |
SurrealDB host |
SURREAL_PORT |
8000 |
SurrealDB port |
SURREAL_USER |
"root" |
SurrealDB user |
SURREAL_PASSWORD |
(required) | SurrealDB password |
SURREAL_NAMESPACE |
"chat" |
SurrealDB namespace |
SURREAL_DATABASE |
"production" |
SurrealDB database |
SURREAL_URL |
(auto-built) | WebSocket URL, assembled at line 163 |
| Variable | Default | Description |
|---|---|---|
REDIS_HOST |
"localhost" |
Redis host |
REDIS_PORT |
6379 |
Redis port |
REDIS_PASSWORD |
"" |
Redis password (optional) |
REDIS_URL |
(auto-built) | Connection URL, assembled at line 178 |
| Variable | Default | Description |
|---|---|---|
RP_ID |
"localhost" |
Relying Party ID (domain) |
RP_NAME |
"Encrypted P2P Chat" |
Relying Party display name |
RP_ORIGIN |
"http://localhost:3000" |
Expected origin for credential verification |
The RP_ID must match the domain the browser sees. In production, this would be "chat.example.com". If it does not match, WebAuthn verification fails.
| Variable | Default | Description |
|---|---|---|
WS_HEARTBEAT_INTERVAL |
30 |
Seconds between heartbeat pings |
WS_MAX_CONNECTIONS_PER_USER |
5 |
Max simultaneous WebSocket connections |
KEY_ROTATION_DAYS |
90 |
General key rotation period |
MAX_SKIPPED_MESSAGE_KEYS |
1000 |
Max out-of-order messages per ratchet |
RATE_LIMIT_MESSAGES_PER_MINUTE |
60 |
Per-user message rate limit |
RATE_LIMIT_AUTH_ATTEMPTS |
5 |
Max auth attempts before lockout |
CORS_ORIGINS |
["http://localhost:3000", "http://localhost:5173"] |
Allowed CORS origins |
PostgreSQL pool: 20 base connections + 40 overflow = 60 max simultaneous connections (config.py:118-119). The engine uses pool_pre_ping=True (models/Base.py:43) to detect and replace stale connections before queries fail.
SQLAlchemy async sessions are created per-request via the get_session() dependency (models/Base.py:54-59). Sessions are recycled automatically after each request.
Redis pool: 50 connections (redis_manager.py:39). Connection pooling is handled by redis.asyncio.ConnectionPool.
SurrealDB: Single persistent WebSocket connection per application instance (surreal_manager.py:47-48). The ensure_connected() method (surreal_manager.py:73-78) lazily reconnects if the connection drops.
60 messages per minute per user rate limit (config.py:146). This translates to one message per second sustained, which is reasonable for a chat application.
WebSocket heartbeat every 30 seconds (config.py:140). This is frequent enough to detect dead connections quickly but not so frequent as to waste bandwidth on idle connections.
SurrealDB live queries are push-based (surreal_manager.py:341-359). When a new message is created, SurrealDB pushes it to the subscribed callback immediately. There is no polling interval, so latency is limited to network round-trip plus SurrealDB processing time.
Approximate per-message encryption cost (based on typical X25519/AES-256-GCM performance):
X25519 DH exchange: ~0.1ms
HMAC-SHA256 chain step: ~0.01ms
AES-256-GCM encrypt: ~0.001ms per KB (for typical chat messages)
HKDF derivation: ~0.01ms
Total per-message: < 1ms
This is negligible compared to network latency (typically 10-100ms). Encryption overhead is not a bottleneck for this application.
5 WebSocket connections per user (config.py:141). This supports a reasonable number of devices (phone, laptop, tablet, desktop, secondary browser) without allowing a single user to exhaust server WebSocket capacity.
The ConnectionManager delivers each message to all of a user's connections (websocket_manager.py:135-153), so the cost of multi-device scales linearly with the number of connections per user.
This is the most obvious question about the architecture, so it deserves a direct answer.
PostgreSQL stores relational data that requires ACID transactions: users with unique usernames, credentials with foreign keys to users, identity keys with unique constraints per user, ratchet states that must be updated atomically. This data has complex relationships (user has many credentials, ratchet state references two users, etc.) and benefits from SQL's referential integrity.
SurrealDB stores messages and presence data. The key feature is live queries: when a message is created, SurrealDB pushes it to subscribers in real-time without polling. This is the core of the chat experience. You could do this with PostgreSQL LISTEN/NOTIFY, but SurrealDB's live queries are more natural for this pattern and the schema is document-oriented, which fits the variable structure of encrypted messages better.
Redis stores ephemeral data that should auto-expire: WebAuthn challenges (600s TTL), rate limit counters. You would not want to poll PostgreSQL to clean up expired challenges. Redis handles this natively with key TTL.
The tradeoff is operational complexity. Running three databases means three things that can fail, three things to back up, three things to monitor. The justification is that each database is doing what it does best, and trying to make one database do all three jobs would create worse tradeoffs (e.g., polling for real-time updates, manual expiry jobs for challenges).
WebAuthn passkeys are phishing-resistant by design. The credential is bound to the RP origin, so it cannot be used on a fake domain. There are no passwords to steal from a database breach. The private key never leaves the authenticator hardware.
The tradeoff is browser and device support. WebAuthn requires a modern browser and an authenticator (Touch ID, Windows Hello, YubiKey, etc.). Account recovery is harder: if you lose your only authenticator, you lose access. This can be mitigated by registering multiple authenticators.
SolidJS uses fine-grained reactivity. When a message arrives, only the specific DOM elements that depend on that message are updated. React would diff the entire virtual DOM tree for the message list. For a real-time chat application where messages arrive frequently, this difference matters.
SolidJS also has a smaller bundle size than React, which helps with initial load time. The nanostores library provides a framework-agnostic reactive store that integrates naturally with SolidJS's reactivity model.
The tradeoff is ecosystem size. React has far more libraries, tutorials, and Stack Overflow answers. SolidJS is smaller and you occasionally need to build something that React would have a library for.
message_service.py has both send_encrypted_message (server-side encryption, line 316, marked [DEPRECATED]) and store_encrypted_message (client-side passthrough, line 269). The server-side path exists because the system was originally built with server-side encryption and is being migrated to full client-side encryption.
In the server-side path, the server loads the ratchet state, encrypts the plaintext, and stores the ciphertext. This means the server momentarily has access to the plaintext. In the client-side path, the server never sees the plaintext; it receives and stores ciphertext as-is.
The client-side path is the correct production path. The server-side path remains for backwards compatibility and as a fallback during the migration.
docker compose -f dev.compose.yml up
┌───────────────────────────────────────────────────────────────┐
│ Docker Network (chat_network_dev) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ PostgreSQL │ │ SurrealDB │ │ Redis │ │
│ │ 16-alpine │ │ latest │ │ 8-alpine │ │
│ │ :5432→5432 │ │ :8000→8001 │ │ :6379→6379 │ │
│ │ │ │ file://data│ │ appendonly │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ FastAPI │ │ Vite Dev │ │ Nginx │ │
│ │ :8000→8000 │ │ :5173→5173 │ │ :80→80 │ │
│ │ uvicorn │ │ HMR enabled│ │ reverse │ │
│ │ --reload │ │ hot module │ │ proxy │ │
│ │ vol: ./back │ │ vol: ./fro │ │ dev.nginx │ │
│ │ end │ │ ntend │ │ config │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Source: dev.compose.yml │
└───────────────────────────────────────────────────────────────┘
In development, all services expose ports to the host for direct access and debugging. The backend volume-mounts the ./backend directory and runs with uvicorn --reload for live reloading. The frontend volume-mounts ./frontend and runs Vite's dev server with HMR (Hot Module Replacement). Nginx sits in front as a reverse proxy matching the production topology.
docker compose up
┌───────────────────────────────────────────────────────────────┐
│ Docker Network (chat_network) │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Nginx (:80, :443) │ │
│ │ │ │
│ │ /api/* ────────► upstream: backend:8000 (HTTP) │ │
│ │ /ws ────────► upstream: backend:8000 (WebSocket) │ │
│ │ /* ────────► Static SolidJS build (from image) │ │
│ │ │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────┼──────────────────────────────┐ │
│ │ │ │ │
│ │ ┌─────────────┐ ┌─────┴─────┐ ┌─────────────┐ │ │
│ │ │ PostgreSQL │ │ FastAPI │ │ SurrealDB │ │ │
│ │ │ 16-alpine │ │ gunicorn │ │ latest │ │ │
│ │ │ vol: data │ │ workers │ │ vol: data │ │ │
│ │ │ healthcheck │ │ no ports │ │ healthcheck│ │ │
│ │ │ restart: │ │ exposed │ │ restart: │ │ │
│ │ │ always │ │ to host │ │ always │ │ │
│ │ └─────────────┘ └───────────┘ └─────────────┘ │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ Redis │ │ │
│ │ │ 8-alpine │ │ │
│ │ │ vol: data │ │ │
│ │ │ maxmem 2gb │ │ │
│ │ │ LRU evict │ │ │
│ │ └─────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Source: compose.yml │
└───────────────────────────────────────────────────────────────┘
In production, the FastAPI backend does not expose any ports to the host. It is only accessible through the Nginx container on the internal Docker network. Databases use named volumes for persistence and restart: always for automatic recovery. The frontend container serves the pre-built SolidJS static files through Nginx.
The compose file (compose.yml) uses depends_on with condition: service_healthy to ensure databases are ready before the backend starts. Each database has a healthcheck command (PostgreSQL: pg_isready, SurrealDB: /health, Redis: redis-cli ping).
Defined in core/exceptions.py (lines 1-95):
AppException (base)
├── UserExistsError → 409 Conflict
├── UserNotFoundError → 404 Not Found
├── UserInactiveError → 403 Forbidden
├── CredentialNotFoundError → 404 Not Found
├── CredentialVerificationError → 401 Unauthorized
├── ChallengeExpiredError → 400 Bad Request
├── DatabaseError → 500 Internal Server Error
├── AuthenticationError → 401 Unauthorized
├── InvalidDataError → 400 Bad Request
├── EncryptionError → 500 Internal Server Error
├── DecryptionError → 500 Internal Server Error
├── RatchetStateNotFoundError → 404 Not Found
└── KeyExchangeError → 500 Internal Server Error
Registered in core/exception_handlers.py (lines 221-246) via register_exception_handlers(app), which is called from factory.py:89. Each exception type maps to a handler function that returns a JSONResponse with the appropriate HTTP status code.
For security-sensitive errors (DatabaseError, EncryptionError, DecryptionError, KeyExchangeError), the response body contains a generic message ("Internal server error", "Encryption failed", etc.) rather than the actual error detail. The detail is logged server-side but not exposed to the client.
WebSocket errors are sent as JSON to the client:
{
"type": "error",
"error_code": "invalid_json",
"error_message": "Invalid JSON format"
}Error codes include: max_connections, database_error, invalid_json, missing_type, unknown_type, processing_error. These are defined inline in websocket.py (lines 62-76) and websocket_manager.py (lines 58-62).
Where to add new features:
| What | Where | Steps |
|---|---|---|
| New API endpoint | api/ directory |
1. Create router in api/new_feature.py 2. Create service in services/new_feature_service.py 3. Register router in factory.py (after line 113) |
| New database model | models/ directory |
1. Create model class inheriting BaseDBModel 2. Import it somewhere that loads at startup (e.g., models/__init__.py) 3. init_db() auto-creates the table via SQLModel.metadata.create_all |
| New encryption algorithm | core/encryption/ |
1. Add implementation in core/encryption/ 2. Add config constants in config.py 3. Wire into message_service.py or crypto-service.ts |
| New WebSocket message type | websocket_service.py |
1. Add constant in config.py (WS_MESSAGE_TYPE_*) 2. Add handler method in WebSocketService 3. Add routing case in route_message() |
| New SurrealDB collection | surreal_manager.py |
1. Add CRUD methods in SurrealDBManager 2. Add response schema in schemas/surreal.py |
These are known architectural limitations, not bugs:
-
No group chat encryption. The Double Ratchet is a two-party protocol. Group chat encryption would require either Sender Keys (what Signal uses for groups, where each member maintains a separate ratchet with every other member) or the MLS (Messaging Layer Security) protocol. Neither is implemented.
-
No post-quantum key exchange. X25519 is vulnerable to Shor's algorithm on a sufficiently powerful quantum computer. Migrating to a hybrid scheme (X25519 + ML-KEM) would future-proof the key exchange, but this adds complexity and the quantum threat timeline is debated.
-
Metadata not protected. The server knows who sends messages to whom, when, how frequently, and the approximate size of each message. Only the content is encrypted. Metadata protection would require techniques like onion routing, padding, or dummy traffic, all of which add significant complexity and performance cost.
-
No message deletion or expiry. Once a message is stored in SurrealDB, it stays there indefinitely. There is no TTL on messages and no "delete for everyone" feature. SurrealDB does support the
_schedule_room_deletionmethod for ephemeral rooms (surreal_manager.py:393-425), but this is room-level, not message-level. -
Single-region deployment. The Docker Compose setup assumes all services run on one machine or one cluster. There is no geo-distribution, no CDN for the frontend, and no database replication. For a production deployment serving users across regions, you would need to add these.
-
No offline message queue. If a recipient is offline when a message is sent, the message is stored in SurrealDB but only delivered when the recipient connects and the live query fires. There is no explicit mechanism for fetching missed messages on reconnect beyond the live query catching up. The
get_room_messagesmethod (surreal_manager.py:124-153) exists for fetching message history, but the client must explicitly call it.
| What | Where | Lines |
|---|---|---|
| App factory | factory.py |
63-115 |
| Lifespan (DB init) | factory.py |
39-61 |
| All settings | config.py |
96-218 |
| Crypto constants | config.py |
64-77 |
| X3DH protocol | core/encryption/x3dh_manager.py |
56-353 |
| X3DH sender | core/encryption/x3dh_manager.py |
208-281 |
| X3DH receiver | core/encryption/x3dh_manager.py |
283-350 |
| Double Ratchet | core/encryption/double_ratchet.py |
64-419 |
| Encrypt message | core/encryption/double_ratchet.py |
323-362 |
| Decrypt message | core/encryption/double_ratchet.py |
364-416 |
| WebAuthn manager | core/passkey/passkey_manager.py |
43-210 |
| Clone detection | core/passkey/passkey_manager.py |
184-193 |
| WebSocket pool | core/websocket_manager.py |
31-296 |
| Heartbeat loop | core/websocket_manager.py |
177-201 |
| Live query sub | core/websocket_manager.py |
203-223 |
| Live msg handler | core/websocket_manager.py |
225-251 |
| SurrealDB client | core/surreal_manager.py |
28-428 |
| Redis client | core/redis_manager.py |
19-174 |
| Exception types | core/exceptions.py |
7-95 |
| Exception handlers | core/exception_handlers.py |
221-246 |
| Message storage | services/message_service.py |
269-314 |
| Conversation init | services/message_service.py |
48-166 |
| Key management | services/prekey_service.py |
41-468 |
| Client key upload | services/prekey_service.py |
45-150 |
| Prekey bundle | services/prekey_service.py |
293-361 |
| SPK rotation | services/prekey_service.py |
221-291 |
| OPK replenish | services/prekey_service.py |
363-407 |
| Auth service | services/auth_service.py |
47-601 |
| WS message router | services/websocket_service.py |
35-324 |
| Auth endpoints | api/auth.py |
31-103 |
| Encryption endpoints | api/encryption.py |
21-127 |
| WebSocket endpoint | api/websocket.py |
25-84 |
| User model | models/User.py |
24-68 |
| Credential model | models/Credential.py |
27-78 |
| Identity key model | models/IdentityKey.py |
18-48 |
| Signed prekey model | models/SignedPrekey.py |
20-50 |
| One-time prekey model | models/OneTimePrekey.py |
18-45 |
| Ratchet state model | models/RatchetState.py |
18-68 |
| Skipped keys model | models/SkippedMessageKey.py |
17-51 |
| DB engine + sessions | models/Base.py |
37-67 |
| Client crypto prims | frontend/src/crypto/primitives.ts |
1-397 |
| Client crypto svc | frontend/src/crypto/crypto-service.ts |
- |
| Client double ratchet | frontend/src/crypto/double-ratchet.ts |
- |
| Client X3DH | frontend/src/crypto/x3dh.ts |
- |
| Client key store | frontend/src/crypto/key-store.ts |
- |
| Production compose | compose.yml |
1-118 |
| Development compose | dev.compose.yml |
1-151 |