A PostgreSQL backend for persistent that uses the binary wire protocol and libpq pipeline mode.
Mostly a drop-in replacement for persistent-postgresql. All standard persistent operations work without code changes aside from type signatures and import changes.
| Feature | persistent-postgresql | persistent-postgresql-ng |
|---|---|---|
| Wire protocol | Text (via postgresql-simple) | Binary (via postgresql-binary) |
| Automatic pipelining | No | Yes– Hedis-style lazy reply stream |
| Bulk insert | INSERT ... VALUES (?,?,...), (?,?,...), ... |
INSERT ... SELECT * FROM UNNEST($1::type[], ...) |
| IN clauses | IN (?,?,?,...) |
= ANY($1) |
| Direct decode path | No | Yes– zero PersistValue allocation |
| Result fetch modes | All-at-once only | All-at-once, single-row, chunked (PG17+) |
Measured against persistent-postgresql on the same PostgreSQL 16 instance. Three network conditions: localhost (0ms), 1ms added latency per direction (2ms RTT), and 5ms per direction (10ms RTT).
Latency was introduced using a TCP delay proxy (bench/delay-proxy.py).
| Benchmark | pipeline | simple | speedup |
|---|---|---|---|
| get ×100 (pipelined reads) | 1.7ms | 4.7ms | 2.8× |
| insert ×100 (pipelined RETURNING) | 10.8ms | 12.8ms | 1.2× |
| upsert ×100 (pipelined RETURNING) | 8.9ms | 12.7ms | 1.4× |
| insertMany ×1000 (UNNEST) | 5.3ms | 14.1ms | 2.7× |
| delete ×100 then select | 4.5ms | 7.5ms | 1.7× |
| mixed DML ×100 then select | 14.6ms | 29.9ms | 2.0× |
| selectList ×100 | 8.6ms | 11.2ms | 1.3× |
At zero latency, the advantage comes from the binary protocol and UNNEST-based bulk inserts. Individual get and insert are comparable because round-trip time is negligible.
| Benchmark | pipeline | simple | speedup |
|---|---|---|---|
| get ×100 (pipelined reads) | 11ms | 310ms | 28× |
| insert ×100 (pipelined RETURNING) | 13ms | 314ms | 24× |
| upsert ×100 (pipelined RETURNING) | 13ms | 321ms | 25× |
| insertMany ×1000 (UNNEST) | 8.6ms | 31.0ms | 3.6× |
| selectList ×100 | 16.6ms | 25.8ms | 1.6× |
| select IN ×20 | 17.4ms | 24.8ms | 1.4× |
With even modest latency, the automatic pipelining dominates. mapM get keys, mapM insert records, and forM_ records upsert all send queries before reading results– one flush instead of 100 round-trips.
| Benchmark | pipeline | simple | speedup |
|---|---|---|---|
| get ×100 (pipelined reads) | 50ms | 1.19s | 24× |
| insert ×100 (pipelined RETURNING) | 41ms | 1.20s | 29× |
| insertMany ×1000 (UNNEST) | 22.8ms | 72.6ms | 3.2× |
| selectList ×100 | 47.9ms | 74.0ms | 1.5× |
| select IN ×20 | 44.1ms | 70.3ms | 1.6× |
The speedup scales linearly with latency. At 10ms RTT, 100 sequential round-trips cost 1000ms minimum. The pipeline pays one RTT for the flush and reads all 100 results from the server's already-buffered responses.
The improvements come from three independent sources. The 0ms column isolates the binary protocol effect (pipelining has no benefit when round-trips are free). The 1ms column shows the combined effect, and the difference reveals the pipelining contribution.
| Benchmark | 0ms: pipeline / simple | 1ms: pipeline / simple | Source of speedup |
|---|---|---|---|
| get ×100 | 1.7ms / 4.7ms (2.8×) | 11ms / 310ms (28×) | 0ms: binary decode. 1ms: Hedis-style lazy pipelining (100 queries in 1 flush) |
| insert ×100 | 10.8ms / 12.8ms (1.2×) | 13ms / 314ms (24×) | 0ms: binary encode. 1ms: lazy RETURNING pipelining |
| delete ×100 | 8.4ms / 12.9ms (1.5×) | 25ms / 592ms (24×) | 0ms: binary protocol. 1ms: fire-and-forget pipelining |
| update ×100 | 8.3ms / 12.5ms (1.5×) | 25ms / 555ms (22×) | 0ms: binary protocol. 1ms: fire-and-forget pipelining |
| replace ×100 | 11.1ms / 11.5ms (1.0×) | 27ms / 602ms (22×) | 0ms: ~neutral. 1ms: fire-and-forget pipelining |
| insertMany ×1000 | 7.2ms / 16.7ms (2.3×) | 8.6ms / 31.0ms (3.6×) | 0ms: UNNEST (1 query vs N). 1ms: UNNEST + fewer round-trips |
| selectList ×100 | 13.5ms / 15.6ms (1.2×) | 16.6ms / 25.8ms (1.6×) | 0ms: binary decode. 1ms: binary + pipelined setup |
| upsert ×100 | 8.9ms / 12.7ms (1.4×) | 13ms / 321ms (25×) | 0ms: binary protocol. 1ms: lazy RETURNING pipelining |
| deleteWhere ×100 | 90ms / 99ms (1.1×) | 119ms / 750ms (6.3×) | 0ms: ~neutral. 1ms: fire-and-forget pipelining |
Summary of sources:
| Source | Typical gain at 0ms | Typical gain at 1ms/dir |
|---|---|---|
| Binary protocol (encode/decode) | 1.2-2.8× | 1.2-2.8× |
| UNNEST bulk insert | 2.3× | 3.6× |
| Fire-and-forget DML pipelining | 1.0× | 20-24× |
| Hedis-style lazy pipelining (get, insert, upsert) | 1.0× | 24-28× |
| Combined (best case) | 2.8× | 28× |
The binary protocol provides a constant-factor improvement regardless of latency. Pipelining provides a latency-proportional improvement that dominates at any non-zero network distance.
# Baseline (direct connection)
stack bench persistent-postgresql-ng
# With artificial latency via TCP proxy
python3 bench/delay-proxy.py 15432 localhost 5432 1 & # 1ms per direction
PGPORT=15432 PGHOST=127.0.0.1 stack bench persistent-postgresql-ng
kill %1
# With system-level latency (macOS, requires root)
sudo bench/run-with-latency.sh 1 # 1ms via dummynetAll read operations (get, getBy, insert with RETURNING, count, exists) use a Hedis-style lazy reply stream for automatic optimal pipelining. No API changes are required– standard persistent code like mapM get keys is automatically pipelined.
The technique:
- At connection time, an infinite lazy list of server replies is created using
unsafeInterleaveIO. Each element, when forced, flushes the send buffer and reads one result. - Each command sends eagerly (writes to the output buffer) and receives lazily (pops an unevaluated thunk from the reply list via
atomicModifyIORef). - The actual network read happens when the caller inspects the result value. If 100
getcalls are sequenced before any result is inspected, all 100 queries are sent in one flush and results are read sequentially from the server's response buffer.
The ordering guarantee comes from the lazy list structure: each thunk N is created inside thunk N-1's unsafeInterleaveIO body, so replies are always read in pipeline order regardless of evaluation order.
Write operations (delete, update, replace, deleteWhere, updateWhere) remain fire-and-forget– they send the query and don't read the result until a subsequent read operation (or transaction commit) drains them.
In addition to the standard PersistValue-based path, the backend supports a direct codec path that bypasses PersistValue entirely. See the RFC for full design details.
-- Switch one import to opt in:
import Database.Persist.Sql.Experimental -- instead of Database.Persist.SqlFor code with the concrete backend type (zero overhead, full specialization):
rawSqlDirect
"SELECT name, age FROM users WHERE age > $1"
(writeParam (18 :: Int))
:: ReaderT (WriteBackend PostgreSQLBackend) m [(Text, Int64)]For code through SqlBackend (uses DirectEntity + Typeable bridge):
rawSqlDirectCompat
"SELECT name, age FROM users WHERE age > $1"
[toPersistValue (18 :: Int)]
:: ReaderT SqlBackend m (Maybe [(Text, Int64)])See ARCHITECTURE.md for detailed internals: pipeline mode, binary protocol, connection lifecycle, error handling, and the direct decode/encode layer.