Comparison of plone-pgcatalog (PostgreSQL JSONB, single-table) against the standard Plone stack (RelStorage + ZCatalog) — the deployment used by most large Plone sites.
Both backends store data in PostgreSQL. The difference is how catalog queries work:
| PGJsonbStorage + PGCatalog | RelStorage + ZCatalog | |
|---|---|---|
| Object storage | JSONB in object_state |
pickle bytea in object_state |
| Catalog indexes | GIN/B-tree on JSONB columns | BTree persistent objects in ZODB |
| Query engine | PostgreSQL SQL planner | Python index intersection |
| ZODB cache usage | Content objects only | Content + all BTree/Bucket nodes |
| Component | Version |
|---|---|
| Python | 3.13.9 |
| PostgreSQL | 17.7 |
| ZODB | 6.0.1 |
| Products.ZCatalog | 7.1 |
| RelStorage | 4.1.1 |
| Platform | Linux 6.14 (x86_64) |
| PG connection | localhost:5433 (Docker) |
The benchmark creates a real Plone 6 site with real content through
invokeFactory, real workflow transitions, and real catalog queries through
portal_catalog.searchResults(). Each backend runs in its own subprocess
with a fresh database.
ZODB cache-size is set to 400 (vs the default 30,000) to simulate the cache pressure experienced on large production sites where BTree nodes compete for limited cache slots. PGCatalog is unaffected by this setting because it stores no BTree objects in ZODB.
- 500 documents created with realistic metadata (titles, descriptions, subjects, workflow states, dates)
- 50 query iterations after 5 warmup iterations
- Deterministic data (seeded RNG) for reproducibility
- Same queries passed to both backends'
searchResults()
| Field | Distribution |
|---|---|
| portal_type | Document (100%) |
| review_state | published (70%), private (20%), pending (10%) |
| Subject | 0-4 random tags from 15-tag pool |
| SearchableText | Titles + descriptions with random subject keywords |
| Operation | PGCat | RS+ZCat | Ratio |
|---|---|---|---|
| Site creation | 2,287 ms | 1,420 ms | 1.6x slower |
| Content creation | 65.4 ms/doc | 77.2 ms/doc | 1.18x faster |
| Content modification | 13.9 ms/doc | 20.7 ms/doc | 1.49x faster |
PGCatalog writes are now faster than RelStorage+ZCatalog at this scale. This was achieved through three rounds of optimization:
-
Write-path optimizations (batch conflict detection, prepared statements, GIL release during Rust decode) — improved from 251 ms/doc to 175 ms/doc (30% faster).
-
ZCatalog BTree elimination — removing
super().catalog_object()calls that were writing ~140 BTree/Bucket objects per document into ZODB, despite PGCatalog never querying them. This single change reduced content creation from 175 ms/doc to 68.5 ms/doc (2.5x faster) and shifted PGCatalog from 2.2x slower to 1.13x faster than RelStorage+ZCatalog. -
Clean break from ZCatalog — replacing the deep
CatalogTool → ZCatalog → ObjectManagerinheritance chain withUniqueObject + Folder. Eliminated overhead from attribute lookups, security checks, and Acquisition wrapping in the hot path. Content creation improved from 68.5 ms/doc to 65.4 ms/doc, modification from 14.0 ms/doc to 13.9 ms/doc, and queries improved ~2x across the board.
| Query Scenario | PGCat | RS+ZCat | Ratio |
|---|---|---|---|
Simple field match (portal_type=Document) |
3.2 ms | 0.25 ms | 12.8x |
| Complex multi-index (type+state+subject+sort+limit) | 2.2 ms | 0.61 ms | 3.7x |
Full-text search (SearchableText) |
5.5 ms | 0.61 ms | 8.9x |
| Navigation (path depth=1 + sort) | 3.3 ms | 0.74 ms | 4.5x |
| Security filtered (published + anonymous) | 3.5 ms | 0.24 ms | 14.8x |
| Date-sorted with pagination | 3.8 ms | 0.30 ms | 12.9x |
Profiling PGCatalog's 330 query executions (6 scenarios x 55 iterations):
| Component | Time | Share | What |
|---|---|---|---|
| PG network wait | 1.61s | 81% | SQL execution + round-trip to Docker PG |
| Row construction | 0.16s | 8% | psycopg dict_row for 106k rows |
| Other | 0.21s | 11% | Query building, brain construction, pool ops |
In contrast, ZCatalog runs entirely in-process with BTree nodes cached in ZODB's object cache. At 500 documents, the entire BTree index fits in cache-size 400, so every query is a pure in-memory set operation.
Replaced stdlib json.loads with orjson.loads via psycopg.types.json.set_json_loads().
| Component | Before | After orjson | Change |
|---|---|---|---|
| JSON parsing (106k calls) | 1.60s | 0.54s | -66% |
| Total query time (330 queries) | 6.21s | 5.00s | -20% |
| PG network wait | 3.15s | 3.30s | ~same |
| Query Scenario | Before | After orjson | Change |
|---|---|---|---|
| Simple field match | 25.6 ms | 19.1 ms | -25% |
| Complex multi-index | 4.9 ms | 4.9 ms | 0% |
| Full-text search | 23.1 ms | 17.3 ms | -25% |
| Navigation | 25.8 ms | 19.9 ms | -23% |
| Security filtered | 21.2 ms | 17.0 ms | -20% |
| Date-sorted | 8.4 ms | 7.4 ms | -12% |
Complex query (4.9 ms) already has a small result set (paginated), so JSON parsing overhead was negligible there. Queries returning many rows (Simple, Full-text, Navigation) saw the largest benefit.
Changed _run_search to SELECT only zoid, path (no idx JSONB).
When brain metadata is first accessed, CatalogSearchResults._load_idx_batch()
fetches all idx values in a single batch query. The benchmark never accesses
metadata, so idx is never loaded — exactly the len(results) / pagination
use case that dominates real Plone page loads.
| Component | Phase 1 | Phase 2 | Change |
|---|---|---|---|
| Total query time (330 queries) | 5.00s | 1.90s | -62% |
| PG network wait | 3.30s | 1.63s | -51% |
| JSON parsing (orjson) | 0.54s | 0s | -100% (deferred) |
| Row construction (dict_row) | 0.10s | 0.06s | -40% |
| Query Scenario | Phase 1 | Phase 2 | Change | vs RS+ZC |
|---|---|---|---|---|
| Simple field match | 19.1 ms | 6.0 ms | -69% | 14x |
| Complex multi-index | 4.9 ms | 4.6 ms | -6% | 6.6x |
| Full-text search | 17.3 ms | 3.9 ms | -77% | 4.2x |
| Navigation | 19.9 ms | 5.9 ms | -70% | 3.2x |
| Security filtered | 17.0 ms | 6.8 ms | -60% | 14x |
| Date-sorted | 7.4 ms | 6.0 ms | -19% | 9.8x |
After Phase 2, PG network wait is 86% of total time — the query is now I/O bound, not CPU bound. The remaining gap vs ZCatalog is structural: network round-trip to Docker PG vs in-process BTree cache.
Added prepare=True to cursor.execute() in _run_search and
_load_idx_batch. Forces prepared statements from the first execution.
Impact: negligible. psycopg3's auto-prepare threshold (5 executions)
already activates during warmup, so explicit prepare=True only helps
the first 5 queries per pattern. Total time: 1.92s vs 1.90s (within noise).
Thread-local connection reuse via get_request_connection() / IPubEnd
subscriber. Eliminates pool lock overhead for pages with multiple
catalog queries (typical Plone page does 3-5 catalog queries).
Impact on benchmark: negligible. The benchmark does one query per
"request", so pool overhead was already minimal (~0.008s total).
The real benefit is in production where a page load does 3-5 catalog
queries — reusing the connection saves the getconn/putconn lock
per extra query.
Removed four super() calls in PlonePGCatalogTool that delegated to
ZCatalog's write path — indexObject(), reindexObject(),
catalog_object(), and uncatalog_object(). These were writing ~140
BTree/Bucket objects per document into ZODB (FieldIndex entries,
KeywordIndex forward/reverse maps, ZCTextIndex postings, etc.) that
PGCatalog never queries.
| Metric | Before | After | Change |
|---|---|---|---|
| ZODB store() calls per doc | 193 | 40 | -79% |
| Content creation | 175 ms/doc | 68.5 ms/doc | -61% |
| Content modification | 50.1 ms/doc | 14.0 ms/doc | -72% |
| tpc_vote | 19.7 ms | 3.7 ms | -81% |
| Commit overhead | 125 ms | 2.7 ms | -98% |
This is the single largest optimization. By eliminating BTree writes, PGCatalog went from 2.2x slower to 1.13x faster than RelStorage+ZCatalog for content creation, and from 2.8x slower to 1.39x faster for content modification.
Replaced PlonePGCatalogTool's base class from
Products.CMFPlone.CatalogTool.CatalogTool (which inherits ZCatalog → ObjectManager → SimpleItem → ..., ~15 classes deep) with
UniqueObject + Folder. This eliminates unnecessary attribute lookup
chains, security machinery, and Acquisition wrapping that were still
firing in the query and write hot paths even after BTree writes were
removed.
Query impact (median, 50 iterations):
| Query Scenario | Phase 5 | Phase 6 | Change |
|---|---|---|---|
| Simple field match | 6.3 ms | 3.2 ms | -50% |
| Complex multi-index | 4.2 ms | 2.2 ms | -46% |
| Full-text search | 3.8 ms | 5.5 ms | +43% |
| Navigation | 6.1 ms | 3.3 ms | -46% |
| Security filtered | 7.3 ms | 3.5 ms | -52% |
| Date-sorted | 6.9 ms | 3.8 ms | -44% |
Five of six query scenarios are ~2x faster. Full-text search shows a minor regression likely due to PG query planner variance (the SQL path is identical).
Write impact:
| Metric | Phase 5 | Phase 6 | Change |
|---|---|---|---|
| Content creation | 68.5 ms/doc | 65.4 ms/doc | -5% |
| Content modification | 14.0 ms/doc | 13.9 ms/doc | -1% |
Write improvements are modest because the hot path (PG INSERT/UPDATE) was already dominated by network I/O. The main benefit is in the query path where Python-side overhead was more significant.
Query path:
| Metric | Baseline | Phase 1 (orjson) | Phase 2 (lazy idx) | Phase 3+4 | Phase 6 (clean break) |
|---|---|---|---|---|---|
| Total query time (330) | 6.21s | 5.00s (-20%) | 1.90s (-69%) | 1.98s | ~1.12s (-43%) |
| PG network wait | 3.15s | 3.30s | 1.63s | 1.61s | ~1.61s |
| JSON parsing | 1.60s | 0.54s (-66%) | 0s (deferred) | 0s | 0s |
The Phase 6 improvement comes from reduced Python-side overhead (lighter class hierarchy, fewer Acquisition wraps). PG network I/O remains the structural floor. This gap would narrow significantly with Unix domain sockets or a colocated PG instance.
Write path:
| Metric | Baseline (pre-OPT) | After OPT 1-4 | Phase 5 (BTree removal) | Phase 6 (clean break) |
|---|---|---|---|---|
| Content creation | 251 ms/doc | 175 ms/doc (-30%) | 68.5 ms/doc (-73%) | 65.4 ms/doc (-74%) |
| Content modification | — | 50.1 ms/doc | 14.0 ms/doc (-72%) | 13.9 ms/doc (-72%) |
| ZODB store() per doc | ~193 | ~193 | 40 (-79%) | 40 |
| vs RelStorage+ZCatalog | 3.4x slower | 2.2x slower | 1.13x faster | 1.18x faster |
The benchmark at 500 objects shows PGCatalog has significant per-query overhead from network I/O and JSON parsing. The architectural advantages emerge at larger scales:
ZCatalog with 15 indexes on 50k objects creates ~500k persistent BTree
and Bucket objects. With a typical cache-size of 5,000-30,000, most
BTree nodes are evicted and must be re-fetched from storage on every
query. Each fetch is a round-trip to PostgreSQL (via RelStorage) to
load and unpickle a single blob.
PGCatalog stores zero catalog objects in ZODB. The entire cache budget is available for content objects, improving traversal and rendering performance across the board.
ZCatalog intersects index result sets sequentially in Python, always in
registration order regardless of selectivity. It also computes the full
result set before applying b_size slicing — no early termination.
PGCatalog delegates to PostgreSQL's cost-based query planner, which
evaluates index selectivity, chooses optimal join strategies, and
can short-circuit via LIMIT pushdown.
Catalog metadata stored as JSONB is queryable from any PostgreSQL client — reporting tools, admin scripts, monitoring dashboards — without loading Zope or ZODB.
# Requires PostgreSQL on localhost:5433
docker run -d --name zodb-pgjsonb-dev \
-e POSTGRES_USER=zodb -e POSTGRES_PASSWORD=zodb \
-e POSTGRES_DB=zodb_test -p 5433:5432 postgres:17
# Run the benchmark (500 docs, ~4 minutes total)
.venv/bin/python sources/plone-pgcatalog/benchmarks/run_benchmarks.py \
level2 --docs 500 --iterations 50 --warmup 5 \
--cache-size 400 --output results.json
# Run with profiling (cProfile output to stderr)
.venv/bin/python sources/plone-pgcatalog/benchmarks/run_benchmarks.py \
level2 --docs 500 --iterations 50 --warmup 5 --profileusage: run_benchmarks.py level2 [-h] [--docs N] [--iterations N]
[--warmup N] [--rebuild N]
[--cache-size N] [--output FILE]
[--profile]
--docs N Number of documents to create (default 500)
--iterations N Query repetitions (default 50)
--warmup N Warmup iterations (default 5)
--cache-size N ZODB cache-size for ZCatalog backend (default 400)
--output FILE Export full results as JSON
--profile Run cProfile on query scenarios