Testing Guide

Comprehensive guide for testing PostgreSQL extensions in aza-pg, covering critical patterns, common pitfalls, functional testing strategies, and coverage metrics.

Smoke Test Pattern (Critical - Read This First!)
Regression Testing
Regression Test Suites
Session Isolation Pattern
Testing Extension Functionality
Common Pitfalls
Test Categories
Testing Strategy & Coverage
Running Tests

Smoke Test Pattern (Critical - Read This First!)

Why this matters: Tests that skip environment validation waste hours debugging infrastructure issues instead of finding real bugs.

The Problem

When writing tests for extensions or new features, it's tempting to immediately write comprehensive test cases. However, tests can fail for two completely different reasons:

Infrastructure failure: Container won't start, extension not preloaded, missing dependencies
Functional failure: The feature you're testing has a bug

Without smoke tests, you can't tell which is which.

Real-World Example

From recent work on extension update tests:

Original approach (failed):

Wrote 56 tests across 4 files in parallel
28/56 tests (50%) failed due to infrastructure issues:
- Missing POSTGRES_PASSWORD → container exit
- Missing waitForReady() → "connection refused" errors
- pg_cron not preloaded → container crash during init

Time wasted: 2+ hours debugging infrastructure, not testing extensions.

Correct approach (would have succeeded):

Write ONE smoke test first
Run it → discover POSTGRES_PASSWORD missing
Fix → discover waitForReady() missing
Fix → discover pg_cron issue
Total time: 15 minutes, then write comprehensive tests with confidence

The Smoke Test Pattern

Every test file MUST start with this structure:

describe("Feature X Tests", () => {
  let container: string;

  // PHASE 0: Environment Validation (MANDATORY)
  beforeAll(async () => {
    // Validate prerequisites BEFORE starting container
    // Example: Check image exists, required files present, etc.
  });

  // PHASE 1: Smoke Test (MANDATORY - RUN THIS FIRST)
  test("SMOKE: Container starts and PostgreSQL is ready", async () => {
    const start = Date.now();

    container = await harness.startContainer("feature-x-test", {
      POSTGRES_PASSWORD: "test", // ALWAYS required
      // Add other required env vars
    });

    await harness.waitForReady(container); // ALWAYS required

    const elapsed = Date.now() - start;
    expect(elapsed).toBeLessThan(120000); // 120s reasonable limit

    console.log(`✓ Smoke test passed (${elapsed}ms)`);
  });

  // PHASE 2: Extension/Feature Loading (if applicable)
  test("Extension loads successfully", async () => {
    await harness.runSQL(container, "CREATE EXTENSION IF NOT EXISTS my_ext;");

    const version = await harness.runSQL(
      container,
      "SELECT extversion FROM pg_extension WHERE extname = 'my_ext';"
    );

    expect(version).toBeTruthy();
  });

  // PHASE 3: Basic Functionality
  test("Basic feature works", async () => {
    // Test simplest use case
  });

  // PHASE 4: Edge Cases (only after Phases 1-3 pass)
  test("Edge case 1", async () => {
    /* ... */
  });
  test("Edge case 2", async () => {
    /* ... */
  });
});

Required Elements Checklist

Before writing ANY functional tests, your test file MUST include:

POSTGRES_PASSWORD in startContainer() environment
await harness.waitForReady(container) after startContainer()
Smoke test that verifies container starts (Phase 1)
Extension load test if testing an extension (Phase 2)
At least ONE basic functionality test before edge cases (Phase 3)

Common Pitfalls (Learn from These Mistakes)

Pitfall	Symptom	Fix
No POSTGRES_PASSWORD	Container exits immediately: "superuser password not specified"	Add `POSTGRES_PASSWORD: "test"` to env
No waitForReady()	"connection refused" or "No such file or directory" for socket	Add `await harness.waitForReady(container)`
Override preload without defaults	Extensions fail to load: "must be in shared_preload_libraries"	Don't set `POSTGRES_SHARED_PRELOAD_LIBRARIES` unless you understand implications
No smoke test	Spend hours debugging 50 test failures that are all infrastructure	Write smoke test FIRST
Parallel test creation	Multiple test files fail for same infrastructure reason	Write ONE test file, validate, THEN parallelize
Timeout cargo-culting	Increase timeout without understanding why it failed	Add logging, measure actual time, understand root cause

Anti-Patterns to Avoid

❌ Writing all tests first, running later

// DON'T DO THIS
test("feature A", ...);
test("feature B", ...);
test("feature C", ...);
// Run all at once → 3 failures, can't tell if infrastructure or logic

✅ Write → Run → Pass → Next

test("SMOKE: container starts", ...);  // Write & run
// ✓ Passes → confident infrastructure works

test("feature A", ...);  // Write & run
// ✓ Passes → add next test

test("feature B", ...);  // Write & run

❌ Assuming environment state

// DON'T DO THIS
test("extension works", async () => {
  // Assumes container started, PostgreSQL ready, extension loaded
  await runSQL("SELECT my_extension_function()");
});

✅ Explicit validation

test("extension works", async () => {
  // Explicitly verify prerequisites
  expect(container).toBeDefined();

  const loaded = await runSQL(
    "SELECT 1 FROM pg_extension WHERE extname = 'my_ext'"
  );
  expect(loaded).toBe("1");

  // NOW test the function
  await runSQL("SELECT my_extension_function()");
});

Progressive Complexity Model

Tests should progress through complexity levels:

Level 0: Environment validated (prerequisites exist)
         ↓
Level 1: Container starts and PostgreSQL ready (smoke test)
         ↓
Level 2: Extension/feature loads (if applicable)
         ↓
Level 3: Basic happy-path functionality works
         ↓
Level 4: Edge cases, error conditions, performance

Never skip to Level 4 before validating Levels 0-3.

When to Stop and Investigate

If you see these patterns, STOP writing tests and investigate:

Container takes >60s to start → Why? Add logging, measure phases
Multiple unrelated tests fail with same error → Infrastructure issue, not test logic
Error messages mention sockets, files, or "not running" → PostgreSQL not ready
Tests pass individually but fail in suite → Cleanup/isolation issue

Template for New Test Files

Save this as a template when creating new test files:

#!/usr/bin/env bun
/**
 * [Feature Name] Test Suite
 *
 * Tests [brief description of what this tests]
 *
 * Prerequisites:
 * - [List any image requirements]
 * - [List any preload requirements]
 */

import { TestHarness } from "./harness";

const harness = new TestHarness();
let container: string;

// MANDATORY: Smoke test FIRST
describe("Smoke Tests", () => {
  test("Container starts successfully", async () => {
    const start = Date.now();

    container = await harness.startContainer("my-test", {
      POSTGRES_PASSWORD: "test",
    });

    await harness.waitForReady(container);

    const elapsed = Date.now() - start;
    console.log(`✓ Container ready in ${elapsed}ms`);
    expect(elapsed).toBeLessThan(120000);
  });
});

// Feature tests (only after smoke test passes)
describe("[Feature Name] Functional Tests", () => {
  // Your tests here
});

// Cleanup
afterAll(async () => {
  await harness.cleanupAll();
});

Key Takeaway

Structure tests to make infrastructure failures impossible by design:

Explicit environment validation
Smoke test that proves infrastructure works
Progressive complexity (simple → complex)
Clear error messages when assumptions violated

This pattern adds 2 minutes to test writing but saves hours of debugging.

Regression Testing

→ See REGRESSION-TESTING.md for comprehensive regression testing documentation.

Quick reference:

bun test:regression:all          # All tiers (production mode)
bun test:regression:core         # Tier 1: PostgreSQL core (30 tests)
bun test:regression:extensions   # Tier 2: Extension tests (13 extensions)
TEST_MODE=regression bun test:regression:all  # Regression mode (all extensions)

Regression Test Suites

PostgreSQL Official Regression Tests (Tier 1)

Core PostgreSQL regression tests fetched from the official postgres/postgres repository.

Structure:

tests/regression/core/pg-official/
├── sql/          # SQL test files (fetched from src/test/regress/sql/)
├── expected/     # Expected output files (fetched from src/test/regress/expected/)
└── README.md     # Documentation

Fetching Tests:

Tests are fetched on-demand and cached locally. To fetch tests:

# Fetch all default tests (~30 core tests)
bun scripts/ci/fetch-pg-regression-tests.ts

# Fetch specific tests
bun scripts/ci/fetch-pg-regression-tests.ts --tests=boolean,int2,int4

# Force re-download (ignore cache)
bun scripts/ci/fetch-pg-regression-tests.ts --force

Source:

Repository: postgres/postgres
Branch: REL_18_STABLE
Path: src/test/regress/

Cache Policy:

Test files are not committed to the repository (see .gitignore).

They are:

Fetched automatically when running tests (if missing)
Cached locally for faster subsequent runs
Safe to delete (will be re-fetched as needed)

Default Test Set:

The default set includes ~30 critical PostgreSQL regression tests covering:

Data types: boolean, int2, int4, int8, float4, float8, numeric, text, varchar
Core operations: select, insert, update, delete, join, union, subselect
Essential features: constraints, triggers, create_index, create_table, transactions, aggregates, copy, prepare
Advanced features: json, jsonb, arrays, strings, numerology, btree_index

See scripts/ci/fetch-pg-regression-tests.ts for the complete list.

Extension Regression Tests (Tier 2)

Deterministic regression tests for PostgreSQL extensions using SQL + expected output comparison (pg_regress pattern).

Structure:

tests/regression/extensions/
├── {extension-name}/
│   ├── sql/
│   │   └── basic.sql      # SQL test commands
│   └── expected/
│       └── basic.out      # Expected psql output
└── README.md

Test Coverage:

Production Mode (Top 10 Extensions):

vector (pgvector) - Vector similarity search, distance operators
timescaledb - Hypertable creation, time-series data
pg_cron - Job scheduling infrastructure
pgsodium - Cryptographic functions, encryption
pgaudit - Security auditing configuration
pg_stat_monitor - Enhanced query metrics
hypopg - Hypothetical index creation
pg_trgm - Trigram similarity, fuzzy search
pgmq - Message queue operations
timescaledb_toolkit - Time-series hyperfunctions

Comprehensive Mode (Additional 3 Extensions):

postgis - Spatial data types and operations
pgrouting - Graph routing algorithms (Dijkstra)
pgq - High-performance queue operations

Generating Expected Outputs:

Expected output files must be generated from a known-good build:

# Build production image
bun run build

# Generate expected outputs for production extensions
bun scripts/test/test-extension-regression.ts --mode=production --generate-expected

# Generate expected outputs for comprehensive extensions (requires comprehensive build)
bun scripts/test/test-extension-regression.ts --mode=comprehensive --generate-expected

IMPORTANT: Expected outputs are deterministic and should be committed to the repository. They serve as the regression baseline for future test runs.

Running Tests:

Production Mode:

# Test top 10 production extensions
bun scripts/test/test-extension-regression.ts --mode=production

Comprehensive Mode:

# Test all extensions (requires comprehensive build with postgis, pgrouting, pgq enabled)
bun scripts/test/test-extension-regression.ts --mode=comprehensive

Specific Extensions:

# Test specific extensions only
bun scripts/test/test-extension-regression.ts --extensions=vector,timescaledb,pg_cron

Using Existing Container:

# Use existing running container
bun scripts/test/test-extension-regression.ts --container=my-postgres-container

Test Design Principles:

Simplicity: Each test is < 50 lines of SQL, focuses on core functionality
Determinism: No random values, timestamps use fixed dates, outputs are predictable
Self-contained: Tests create and clean up their own data
Basic coverage: Tests verify extension works, not comprehensive feature coverage

Adding New Extension Tests:

Create directory: tests/regression/extensions/{extension-name}/{sql,expected}/
Write SQL test: sql/basic.sql (see existing tests as templates)
Generate expected output: --generate-expected flag
Add extension to TOP_10_EXTENSIONS or COMPREHENSIVE_ONLY_EXTENSIONS in test-extension-regression.ts
Commit SQL + expected output files

Test Execution Details:

Uses psql -X -a -q for consistent output format
Output normalization handles psql formatting variations
Diff generation uses diff -c (context diff) for readability
Failed tests generate extension-regression.diffs file

Related Documentation:

Tier 1 Tests: tests/regression/core/ - PostgreSQL core regression tests
Functional Tests: scripts/test/test-all-extensions-functional.ts - 117+ extension tests
Regression Runner: scripts/test/lib/regression-runner.ts - Shared test infrastructure

Expected Output Generation Status

This tracks which extension regression tests have generated expected outputs.

Generation Instructions:

Expected output files (.out) must be generated from a clean, known-good build:

# 1. Build production image
bun run build

# 2. Generate expected outputs for production mode
bun scripts/test/test-extension-regression.ts --mode=production --generate-expected

# 3. For comprehensive mode (requires PostGIS/pgRouting/PgQ enabled)
# Edit scripts/extensions/manifest-data.ts to enable postgis, pgrouting, pgq
# bun run generate && bun run build
# bun scripts/test/test-extension-regression.ts --mode=comprehensive --generate-expected

Status:

Production Extensions (Top 10):

Comprehensive-Only Extensions:

postgis - expected/basic.out - NEEDS GENERATION (requires comprehensive build)
pgrouting - expected/basic.out - NEEDS GENERATION (requires comprehensive build)
pgq - expected/basic.out - NEEDS GENERATION (requires comprehensive build)
wrappers - expected/basic.out - GENERATED (6 assertions)

Notes:

Expected outputs are deterministic and should remain stable across builds
Non-deterministic values (random(), now()) avoided in tests
All SQL test files (sql/basic.sql) are ready and committed
Expected outputs will be generated during next production build validation

pgTAP Regression Tests (Tier 4)

SQL-based unit tests using pgTAP for PostgreSQL and extension functionality.

Overview:

pgTAP is a TAP-compliant testing framework for PostgreSQL. These tests verify core PostgreSQL functionality, extension behavior, and security features.

Test Files:

File	Description	Tests
`01_extensions_availability.sql`	Extension availability verification	15
`02_schema_and_objects.sql`	Schema, tables, functions, triggers	20
`03_vector_extension.sql`	pgvector functionality and similarity search	12
`04_timescaledb_extension.sql`	TimescaleDB hypertables and time-series	15
`05_security_and_permissions.sql`	Roles, RLS, permissions, audit	20

Total: 82 tests

Running Tests:

Using psql:

# Run all pgTAP tests
psql -U postgres -d test_db -f tests/regression/pgtap/01_extensions_availability.sql

# Run specific test file
psql -U postgres -d test_db -f tests/regression/pgtap/03_vector_extension.sql

Using pg_prove:

# Install pg_prove (if not already installed)
sudo cpan TAP::Parser::SourceHandler::pgTAP

# Run all tests
pg_prove -U postgres -d test_db tests/regression/pgtap/*.sql

# Run specific test
pg_prove -U postgres -d test_db tests/regression/pgtap/03_vector_extension.sql

# Verbose output
pg_prove -v -U postgres -d test_db tests/regression/pgtap/*.sql

In Docker Container:

# Start regression test image
docker run --name pg-regression -d -e POSTGRES_PASSWORD=postgres aza-pg:pg18-regression

# Run tests
docker exec pg-regression psql -U postgres -c "CREATE EXTENSION pgtap;"
docker exec pg-regression psql -U postgres -f /tests/regression/pgtap/01_extensions_availability.sql

Test Modes:

Production Mode: Tests enabled extensions only (derived from manifest runtime.defaultEnable=true)
Regression Mode: Tests all catalog entries including disabled extensions (comprehensive coverage)

Some tests automatically adapt based on available extensions (e.g., age, pgq, postgis are regression-only).

Test Structure:

Each test file follows this structure:

BEGIN;

SELECT plan(N);  -- Declare number of tests

-- Test assertions
SELECT has_extension('vector', 'pgvector should be available');
SELECT is(2 + 2, 4, 'arithmetic should work');
SELECT ok(condition, 'description');

SELECT * FROM finish();

ROLLBACK;

pgTAP Assertions Used:

Schema: has_schema(), schema_owner_is()
Tables: has_table(), has_column(), col_not_null(), col_is_unique()
Indexes: has_index(), has_pk()
Functions: has_function(), function_returns(), function_lang_is()
Triggers: has_trigger()
Extensions: has_extension()
Permissions: schema_privs_are(), table_privs_are()
Roles: has_role(), is_superuser(), isnt_superuser()
General: ok(), is(), isnt(), pass(), skip()

Requirements:

PostgreSQL 18.x
pgTAP extension (pre-installed in regression test image)
Enabled extensions (varies by test mode)

References:

Session Isolation Pattern

Critical Concept

PostgreSQL session-local state (LOAD, SET, hypothetical indexes) does not persist across separate SQL invocations. Each runSQL() call creates a new psql session.

The Problem

// ❌ WRONG - Session state lost between calls
await runSQL("LOAD 'auto_explain'");
await runSQL("SET auto_explain.log_min_duration = 0");
await runSQL("SELECT count(*) FROM test_table"); // auto_explain NOT active

Each runSQL() creates a new session. The LOAD and SET commands execute in one session, then that session closes. The final SELECT runs in a completely different session where auto_explain was never loaded.

The Solution

// ✅ CORRECT - Single session preserves state
await runSQL(`
  LOAD 'auto_explain';
  SET auto_explain.log_min_duration = 0;
  SELECT count(*) FROM test_table;  -- auto_explain IS active
`);

Use multi-statement SQL blocks within a single runSQL() call. All commands execute in the same session, preserving state throughout.

Real-World Examples

Example 1: auto_explain Plan Logging

await test("auto_explain - Verify plan logging", "observability", async () => {
  // Execute query in same session where auto_explain is loaded
  const result = await runSQL(`
    LOAD 'auto_explain';
    SET auto_explain.log_min_duration = 0;
    SELECT count(*) FROM test_vectors;
  `);
  assert(result.success, "Query execution with auto_explain failed");
});

Why: LOAD 'auto_explain' and SET commands must be in same session as the SELECT query they affect.

Example 2: pg_plan_filter Query Execution

await test(
  "pg_plan_filter - Execute queries with plan filter active",
  "safety",
  async () => {
    const result = await runSQL(`
    LOAD 'plan_filter';
    SELECT count(*) FROM pg_tables;
  `);
    const lines = result.stdout.split("\n").filter((l) => l.trim());
    const count = parseInt(lines[lines.length - 1]);
    assert(
      result.success && count > 0,
      "Query execution with pg_plan_filter failed"
    );
  }
);

Why: pg_plan_filter hook must be loaded in the same session where queries execute.

Example 3: HypoPG Hypothetical Indexes

await test(
  "hypopg - Create and verify hypothetical index",
  "performance",
  async () => {
    // Create and verify in same session (hypothetical indexes are session-local)
    const result = await runSQL(`
    SELECT * FROM hypopg_create_index('CREATE INDEX ON test_hypopg (val)');
    SELECT count(*) FROM hypopg_list_indexes;
  `);
    const lines = result.stdout.split("\n").filter((l) => l.trim());
    const count = parseInt(lines[lines.length - 1]);
    assert(result.success && count > 0, "Failed to create hypothetical index");
  }
);

await test(
  "hypopg - Verify planner uses hypothetical index",
  "performance",
  async () => {
    const result = await runSQL(`
    SELECT * FROM hypopg_create_index('CREATE INDEX ON test_hypopg (val)');
    EXPLAIN SELECT * FROM test_hypopg WHERE val = 500;
  `);
    assert(result.success, "EXPLAIN query failed with hypothetical index");
  }
);

Why: HypoPG indexes exist only in the current session. Creating an index in one runSQL() call means it's gone by the next call.

Pattern: Each test creates its own hypothetical index within the same session where it's used, since indexes don't persist.

Example 4: Session vs Persistent State

// ✅ Persistent state - can split across calls
await runSQL("CREATE TABLE test_table (id int)");
await runSQL("INSERT INTO test_table VALUES (1)");
await runSQL("SELECT * FROM test_table"); // Table persists

// ❌ Session-local state - MUST be in one call
await runSQL(`
  CREATE TEMP TABLE session_table (id int);
  INSERT INTO session_table VALUES (1);
  SELECT * FROM session_table;  -- Must query in same session
`);

Rule: If it's session-local (TEMP tables, LOAD, SET, HypoPG), keep it in one runSQL() block.

Testing Extension Functionality

Test Structure

await test("extension_name - What it tests", "category", async () => {
  // 1. Setup (if needed)
  await runSQL("CREATE TABLE IF NOT EXISTS test_data (...)");

  // 2. Execute functionality
  const result = await runSQL("SELECT extension_function(...)");

  // 3. Assert results
  assert(result.success, "Operation failed");
  assert(condition, "Expected behavior not met");
});

Functional Testing Checklist

For each extension, verify:

✅ CREATE EXTENSION succeeds
✅ Basic functionality works (function calls, queries)
✅ Data operations complete successfully
✅ Expected output matches documentation
✅ Session isolation handled correctly (if applicable)

Testing Extensions with Dependencies

Some extensions require other extensions to be created first:

// pgvector depends on base vector type
await runSQL("CREATE EXTENSION IF NOT EXISTS vector");
await runSQL("CREATE EXTENSION IF NOT EXISTS pgvector");

// supabase_vault depends on pgsodium
await runSQL("CREATE EXTENSION IF NOT EXISTS pgsodium");
await runSQL("CREATE EXTENSION IF NOT EXISTS supabase_vault");

Check docker/postgres/docker-entrypoint-initdb.d/01-extensions.sql for the canonical extension creation order.

Common Pitfalls

1. Session State Lost

Symptom: Extension works in manual testing but fails in automated tests.

Cause: Session-local state not preserved across runSQL() calls.

Fix: Use multi-statement SQL blocks in single runSQL() call.

2. HypoPG Indexes Disappear

Symptom: hypopg_list_indexes returns 0 rows after creating index.

Cause: Index created in different session than where it's queried.

Fix: Create and query hypothetical indexes in same runSQL() call.

// ❌ WRONG
await runSQL("SELECT * FROM hypopg_create_index('...')");
const list = await runSQL("SELECT * FROM hypopg_list_indexes"); // Empty!

// ✅ CORRECT
const result = await runSQL(`
  SELECT * FROM hypopg_create_index('...');
  SELECT * FROM hypopg_list_indexes;
`);

3. Output Parsing Errors

Symptom: Test fails with parseInt(NaN) or "count is not a number".

Cause: Unexpected output format from query.

Fix: Filter empty lines, handle headers, parse last line:

const lines = result.stdout.split("\n").filter((l) => l.trim());
const count = parseInt(lines[lines.length - 1]);
assert(!isNaN(count), "Failed to parse count");

4. LOAD Commands Not Active

Symptom: Hook-based extension (auto_explain, pg_plan_filter) has no effect.

Cause: LOAD executed in different session than query.

Fix: Load extension in same SQL block as query:

await runSQL(`
  LOAD 'auto_explain';
  SELECT * FROM data;
`);

5. Replication Slot Already Exists

Symptom: ERROR: replication slot "test_slot" already exists

Cause: Previous test didn't clean up replication slot.

Fix: Drop slot after test OR check existence before creating:

// Cleanup after test
await runSQL("SELECT pg_drop_replication_slot('test_wal2json_slot')");

// Or check before creating
await runSQL(`
  SELECT CASE
    WHEN NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'test_slot')
    THEN pg_create_logical_replication_slot('test_slot', 'wal2json')
  END
`);

6. Temp Tables Not Found

Symptom: ERROR: relation "temp_table" does not exist

Cause: Temp table created in different session.

Fix: Create and use temp tables in same runSQL() call:

await runSQL(`
  CREATE TEMP TABLE session_data (id int);
  INSERT INTO session_data VALUES (1);
  SELECT * FROM session_data;
`);

7. Hook Extensions Not Working

Symptom: pg_safeupdate doesn't block UPDATE without WHERE, or supautils GUC parameters not found.

Cause: Extension not preloaded via shared_preload_libraries or session_preload_libraries.

Fix: Load hook extensions at appropriate scope:

# pg_plan_filter requires shared_preload_libraries
POSTGRES_SHARED_PRELOAD_LIBRARIES="pg_stat_statements,auto_explain,pg_cron,pgaudit,pg_plan_filter"

# pg_safeupdate uses session_preload_libraries
psql -c "SET session_preload_libraries = 'pg_safeupdate'; UPDATE table SET col = 1;"

# supautils requires shared_preload_libraries for GUC parameters
POSTGRES_SHARED_PRELOAD_LIBRARIES="pg_stat_statements,auto_explain,pg_cron,pgaudit,supautils"

Note: Hook-based extensions don't use CREATE EXTENSION - they load via preload libraries.

8. Docker Credential Helper Not Found

Symptom: PgBouncer tests fail with error: docker-credential-osxkeychain: executable file not found in $PATH

Cause: System Docker config (~/.docker/config.json) references credential helper but the binary is not installed or not in PATH.

Fix Option 1 - Remove credential helper from Docker config (Quick):

# Edit ~/.docker/config.json and remove the "credsStore" line:
{
  "auths": {
    "ghcr.io": {}
  }
}

Fix Option 2 - Install credential helper (Permanent):

# macOS
brew install docker-credential-helper

# Linux (Ubuntu/Debian)
sudo apt-get install docker-credential-helpers

# Arch Linux
sudo pacman -S docker-credential-helpers

Automatic Fallback: Test scripts automatically detect missing credential helpers and create isolated test configurations. No manual intervention required unless you want to fix the system-level configuration.

9. Overriding Image Default Preload

Symptom: Extension tests fail with "extension requires preload" errors even though extension works in production.

Cause: Test script hardcodes POSTGRES_SHARED_PRELOAD_LIBRARIES with incomplete list.

Why This Happens: The image has a built-in DEFAULT_SHARED_PRELOAD_LIBRARIES in the entrypoint (auto-generated from manifest). When tests override this env var, they may miss required modules.

Image Default (auto-generated from manifest):

auto_explain,pg_cron,pg_net,pg_stat_monitor,pg_stat_statements,pgaudit,pgsodium,safeupdate,timescaledb

Fix: Don't override unless testing override behavior:

// ❌ WRONG - Hardcoded list gets out of sync
const exitCode = await dockerRunLive([
  "run",
  "-d",
  "-e",
  "POSTGRES_PASSWORD=test",
  "-e",
  "POSTGRES_SHARED_PRELOAD_LIBRARIES=auto_explain,pg_cron,pgaudit", // Missing modules!
  image,
]);

// ✅ CORRECT - Use image default
const exitCode = await dockerRunLive([
  "run",
  "-d",
  "-e",
  "POSTGRES_PASSWORD=test",
  // No POSTGRES_SHARED_PRELOAD_LIBRARIES - uses image built-in default
  image,
]);

Exception: Only override when explicitly testing preload behavior (e.g., testing what happens when a module is NOT loaded).

Test Categories

Core Extensions (5)

PostgreSQL builtins that should always work:

btree_gist, btree_gin, pg_trgm, fuzzystrmatch, unaccent

Note: uuid-ossp is intentionally NOT enabled. PostgreSQL 18 includes the superior built-in uuidv7() function for time-ordered UUIDs with better indexing performance.

Vector Search (2)

pgvector: Vector similarity search, distance functions
vectorscale: DiskANN indexing for large-scale vector search

Full-Text Search (1)

pgroonga: Multi-language full-text search with indexing

Spatial (2)

postgis: Geographic objects, spatial queries
pgrouting: Network routing algorithms

Time-Series (2)

timescaledb: Hypertables, continuous aggregates
timescaledb_toolkit: Time-series analytics functions

Observability (8)

pg_stat_statements: Query performance tracking
pg_stat_monitor: Enhanced query monitoring (1000-query buffer)
auto_explain: Automatic query plan logging
pg_stat_kcache: Kernel cache hit statistics
plpgsql_check: PL/pgSQL code validation
pg_top: Real-time activity monitoring
pgbadger: Log analyzer (tool)
pg_plan_filter: Query plan filtering

Security (4)

pgsodium: Libsodium encryption
supabase_vault: Encrypted secrets storage
pgaudit: Audit logging
set_user: Superuser privilege control

Performance (6)

index_advisor: Index recommendation
hypopg: Hypothetical indexes (session-local)
pg_qualstats: Predicate statistics
pg_wait_sampling: Wait event sampling
hll: HyperLogLog cardinality estimation
rum: Full-text search indexes

CDC (1)

wal2json: JSON output plugin for logical replication

Integration (1)

wrappers: Foreign data wrappers (Supabase) - 10 functional tests, 1 regression suite (6 assertions)

Safety (2)

pg_safeupdate: Prevent UPDATE/DELETE without WHERE
pg_plan_filter: Block queries by plan characteristics

Utilities (11)

pg_cron: Job scheduler
pg_partman: Partition management
pg_repack: Online table repacking
http: HTTP client for REST APIs
pg_hashids: Encode/decode hashids
pg_jsonschema: JSON Schema validation
pgmq: Message queue
supautils: Superuser utility functions
pg_net: Async HTTP (Supabase, requires worker)
pgjwt: JWT generation (Supabase, requires pg_net)
wal2json: Logical decoding (tool)

Testing Strategy & Coverage

Current State

Comprehensive CI Testing Coverage:

All enabled extensions have functional tests with 100% coverage across three dimensions (CREATE EXTENSION, functional test, metadata check).

Test Suite:

scripts/test/test-all-extensions-functional.ts - Comprehensive smoke tests for all enabled extensions
scripts/test/test-auto-config.ts - Auto-config detection across 4 memory scenarios
scripts/test/test-pgbouncer-healthcheck.ts - PgBouncer auth flow validation

Test Coverage Matrix

Category	Extensions	Tests	Coverage
AI/Vector	2	6	100%
Analytics	1	2	100%
CDC	1	3	100%
GIS	2	6	100%
Indexing	2	4	100%
Integration	2	13	100%
Language	1	4	100%
Maintenance	2	5	100%
Observability	4	8	100%
Operations	2	4	100%
Performance	2	5	100%
Quality	1	3	100%
Queueing	1	4	100%
Safety	3	6	100%
Search	3	6	100%
Security	4	10	100%
Timeseries	2	5	100%
Utilities	1	4	100%
Validation	1	4	100%
TOTAL	37	98	100%

Note: For current extension counts (enabled/disabled/total), see docs/.generated/docs-data.json.

Auto-Config Test Coverage

Comprehensive auto-config validation covers 10 memory scenarios from 256MB to 64GB:

Scenario	RAM	CPU	Detection	Config Injection	Status
Manual override	1536MB	-	✅ POSTGRES_MEMORY	shared_buffers, max_connections	Passing
Cgroup v2 limit	2GB	-	✅ cgroup v2	shared_buffers, max_connections	Passing
Minimum supported	512MB	-	✅ cgroup v2	shared_buffers, max_connections	Passing
Large node	64GB	-	✅ POSTGRES_MEMORY	shared_buffers, max_connections	Passing
CPU detection	2GB	2 cores	✅ nproc	worker processes	Passing
Below minimum	256MB	-	✅ Detection	FATAL error	Passing
Custom preload	1GB	-	✅ Override	shared_preload_libraries	Passing
Medium production	4GB	-	✅ cgroup v2	shared_buffers 1024MB, max_conn 200	Passing
Large production	8GB	-	✅ cgroup v2	shared_buffers 2048MB, max_conn 200	Passing
High-load	16GB	-	✅ cgroup v2	shared_buffers 3276MB, max_conn 200	Passing

Details:

Manual override (1536MB) - Respects POSTGRES_MEMORY env var, shared_buffers 25%, connection tier 120
2GB cgroup limit - Detects via cgroup v2, shared_buffers 512MB, connection tier 120
512MB minimum - Minimum supported deployment, shared_buffers 128MB, connection tier 80
64GB manual override - Large-node tuning, shared_buffers ~9830MB, connection tier 200
4GB tier - Medium production, shared_buffers 1024MB (25%), connection tier 200
8GB tier - Large production, shared_buffers 2048MB (25%), connection tier 200
16GB tier - High-load deployment, shared_buffers 3276MB (20%), connection tier 200

For comprehensive memory allocation table with all RAM tiers and formulas, see AGENTS.md Auto-Config section.

PgBouncer Test Coverage

Comprehensive PgBouncer auth flow validation:

.pgpass file existence - Verified at /tmp/.pgpass with 600 permissions
.pgpass entries - Configured for localhost:6432 and pgbouncer:6432
Authentication - Tested via localhost and hostname
SHOW POOLS - Verified pool status and database entries
Healthcheck - Validated healthcheck command execution
Connection pooling - Functional integration testing

See scripts/test/test-pgbouncer-healthcheck.ts for end-to-end stack testing.

Test Quality Metrics

Functional Test Coverage:

All enabled extensions tested across 3 dimensions: CREATE EXTENSION, functional test, metadata check
Comprehensive smoke test suite with assertions
100% extension coverage (no deferred testing)

Auto-Config Test Coverage:

10 test scenarios covering RAM/CPU detection (256MB to 64GB)
7 memory tiers (512MB, 1GB, 2GB, 4GB, 8GB, 16GB, 64GB)
Manual override, cgroup v2 detection, CPU scaling
Edge cases (below-minimum rejection, custom shared_preload_libraries)

PgBouncer Test Coverage:

Happy path (8 tests): .pgpass file management, authentication via localhost/hostname, SHOW POOLS, healthcheck
Failure scenarios (6 tests): wrong password, missing .pgpass, invalid listen address, PostgreSQL down, max connections, wrong permissions

Stack Deployment Test Coverage:

Primary stack: Auto-config, PgBouncer auth, postgres_exporter, pgbouncer_exporter
Replica stack (7 steps): Replication slot creation, standby mode verification, hot standby queries, WAL sync, postgres_exporter
Single stack (7 steps): Standalone mode, extension availability, connection limits, auto-config, no pooler verification

Hook Extension Test Coverage:

pg_plan_filter: Without/with preload, functional validation
pg_safeupdate: Session preload, functional query blocking
supautils: Without/with preload, GUC-based configuration
Multi-hook stability testing

Maintenance

When to update tests:

New extension added to manifest.json → add smoke test to test-all-extensions-functional.ts
Extension version upgraded → verify test still valid
Upstream API changes → update functional test
Auto-config logic changes → update test-auto-config.ts

Who maintains:

Developer adding extension writes smoke test
CI enforces all tests pass before merge

Running Tests

Testing Released Images

The test:image command provides comprehensive validation of released Docker images. Use this to verify a production image before deployment or to validate a release candidate.

Quick Reference:

# Full comprehensive validation (all 9 phases)
bun run test:image ghcr.io/fluxo-kt/aza-pg:18.1-202512012323-single-node

# Fast validation (skips heavy tests like regression)
bun run test:image ghcr.io/fluxo-kt/aza-pg:TAG --fast

Test Phases:

Phase	Description	Tests
1. Pre-flight	Static validation, linting, unit tests	10 checks, 198 tests
2. Image Pull	Pull and verify image exists	psql version check
3. Comprehensive	5-phase image verification	37 tests
4. Auto-Config	Memory tier and workload testing	36 scenarios
5. Extensions	All 5 extension test suites	117+ tests
6. Stack Deploy	Single-node deployment	Full stack test
7. Features	pgflow, pgmq, security	54+ tests
8. Regression	Tiers 1-3 PostgreSQL regression	Core + extensions
9. Negative	Error handling validation	10 scenarios

Usage Examples:

# Test image from GHCR
bun run test:image ghcr.io/fluxo-kt/aza-pg:18.1-202512012323-single-node

# Test local image
bun run test:image aza-pg:latest

# Fast mode (skip regression tests for quick validation)
bun run test:image ghcr.io/fluxo-kt/aza-pg:TAG --fast

Expected Results:

All phases should pass (exit code 0)
Vault tests (3 failures) are expected without pgsodium preload configuration
External service flakiness (httpbin.org) may cause occasional HTTP test failures

See RELEASE-VALIDATION.md for detailed validation results of published releases.

Comprehensive Test Suite (All Validations + Build + Functional)

The test-all.ts script orchestrates all validation checks, build tests, and functional tests:

# Run complete test suite (validation + build + functional)
bun run test:all
# OR: bun scripts/test-all.ts

# Fast mode - validation only (skips Docker build and functional tests)
bun run validate
# OR: bun scripts/test-all.ts --fast

# Skip build - run all tests except Docker build (useful if image exists)
bun scripts/test-all.ts --skip-build

# Show help
bun scripts/test-all.ts --help

Test Categories:

Validation Checks (run in parallel):
- Manifest validation
- TypeScript type checking
- Code linting (oxlint)
- Code formatting (prettier)
- Documentation consistency
- Smoke tests
- ShellCheck (shell script linting)
- Hadolint (Dockerfile linting)
- YAML linting
- Secret scanning
Build Tests (run sequentially):
- Docker image build (with 15min timeout)
- Extension binary size verification
- Extension count verification (dynamically derived from manifest)
Functional Tests (run sequentially):
- Basic extension loading (vector, pg_cron)
- Auto-tuning tests (512MB, 2GB, 4GB memory limits)
- Single-node stack deployment
- Replica/cluster stack deployment
- Comprehensive extension tests (all enabled extensions)

Environment Variables:

# Make optional checks non-critical (useful in environments without these tools)
ALLOW_MISSING_SHELLCHECK=1 bun scripts/test-all.ts
ALLOW_MISSING_HADOLINT=1 bun scripts/test-all.ts
ALLOW_MISSING_YAMLLINT=1 bun scripts/test-all.ts

# Use custom Docker image
POSTGRES_IMAGE=my-custom:tag bun scripts/test-all.ts

Exit Codes:

0: All critical tests passed (non-critical failures are warnings)
1: One or more critical tests failed

Output Format:

The script provides:

Real-time progress indicators for each check
Parallel execution for validation checks (faster)
Sequential execution for build/functional tests (safer)
Categorized summary at the end (Validation, Build, Functional)
Timing for each test
Highlighted failures with actionable error messages

Extension Functional Tests

# Run all 117+ functional tests
bun run scripts/test/test-all-extensions-functional.ts

# Expected output:
# ✓ 117+/117+ tests passing
# ⏱ Duration: ~8-10 seconds

Filter by Category

# Test only vector extensions
bun run scripts/test/test-all-extensions-functional.ts --category vector

# Test security extensions
bun run scripts/test/test-all-extensions-functional.ts --category security

Auto-Config Testing

# Validate RAM/CPU detection and configuration injection
bun scripts/test/test-auto-config.ts [image-tag]

# Tests:
# - Manual override (POSTGRES_MEMORY=1536)
# - 512MB limit (minimum viable)
# - 2GB limit (typical production)
# - 64GB limit (high-end deployment)
# - CPU detection (2 cores)
# - Below minimum rejection (256MB)
# - Custom preload libraries

PgBouncer Testing

# Validate PgBouncer auth flow and connection pooling
bun scripts/test/test-pgbouncer-healthcheck.ts [stack-dir]

# Tests:
# - .pgpass file existence and permissions
# - Authentication via localhost and hostname
# - SHOW POOLS verification
# - Healthcheck command execution
# - Connection pooling functional test

Hook Extension Testing

# Validate hook-based extensions that load via shared_preload_libraries
bun scripts/test/test-hook-extensions.ts [image-tag]

# Extensions tested:
# - pg_plan_filter (hook-based, requires shared_preload_libraries)
# - pg_safeupdate (hook-based, uses session_preload_libraries)
# - supautils (GUC-based, optional shared_preload_libraries)

# Test cases:
# 1. pg_plan_filter without preload (should have no effect)
# 2. pg_plan_filter with preload (verify hook active)
# 3. pg_safeupdate session preload (blocks UPDATE/DELETE without WHERE)
# 4. supautils without preload (GUC params unavailable)
# 5. supautils with preload (GUC params available)
# 6. Combined preload (multiple hooks active simultaneously)

Integration Testing

# Test extension combinations
bun run scripts/test/test-integration.ts

# Combinations tested:
# - timescaledb + pgvector (time-series + vector search)
# - postgis + pgroonga (spatial + full-text)
# - pgsodium + supabase_vault (encryption stack)

Comprehensive Image Testing

# Comprehensive Docker image test harness
bun scripts/docker/test-image.ts                    # Test aza-pg:latest (default)
bun scripts/docker/test-image.ts aza-pg:18.1-202511142330  # Test specific image
bun scripts/docker/test-image.ts ghcr.io/fluxo-kt/aza-pg:18-single-node  # Full registry path

# Options:
# --no-cleanup   Keep container running after tests (for debugging)
# --fast         Skip time-consuming functional tests

# Test Phases:
# Phase 1: Filesystem Verification
#   - Extension directory structure exists
#   - Manifest file present in image
#   - Version info files present (txt, json)
#   - Enabled PGDG extensions present
#   - Disabled PGDG extensions not present
#
# Phase 2: Runtime Verification
#   - Version info counts correct (txt, json)
#   - Preloaded extensions in shared_preload_libraries
#   - Enabled extensions can be created
#   - Disabled extensions cannot be created
#   - PostgreSQL configuration valid
#
# Phase 3: Tools Verification
#   - Tools present (pgbackrest, pgbadger, wal2json, etc.)
#   - pgBackRest functional (version command)
#   - pgBadger functional (version command)
#
# Phase 4: Auto-Configuration Tests
#   - Auto-config applied (shared_buffers, work_mem, etc.)
#
# Phase 5: Functional Tests (Sample)
#   - Basic extension functionality (pgvector, pg_trgm, hstore)
#   - Skipped in --fast mode

# Example output:
# ============================================================
#   Comprehensive Docker Image Test Harness
# ============================================================
# ✅ Docker daemon is running
# ✅ Manifest loaded
# ✅ Test container started
# ✅ PostgreSQL ready in 5.23s
#
# ============================================================
#   Phase 1: Filesystem Verification
# ============================================================
# ✅ Extension directory structure (125ms)
# ✅ Manifest file present (89ms)
# ✅ Version info files present (72ms)
# ✅ Enabled PGDG extensions present (25 verified) (1.45s)
# ✅ Disabled PGDG extensions not present (2 verified) (345ms)
#
# [... additional phases ...]
#
# ============================================================
#   Test Summary
# ============================================================
# ✅ All tests passed!
# Total: 18 | Passed: 18 | Failed: 0

CI Integration

Workflow Architecture

aza-pg uses a reusable workflow pattern for regression testing integration:

Key Workflows:

regression-tests.yml (Reusable Workflow)
- Can be called by other workflows via workflow_call
- Can be triggered manually via workflow_dispatch
- Runs Tier 1-3 regression tests (30 PostgreSQL + 13 extension + 14 interaction tests)
- Accepts image_ref input (full registry/repository:tag or @digest)
- Returns tests_passed output (boolean)
- Duration: ~10-15 minutes
publish.yml (Release Workflow)
- Calls regression-tests.yml as a blocking gate
- Tests run BEFORE promotion to production registry
- Passes testing image reference: testing-{sha}
- Release blocked if regression tests fail

Integration Pattern:

# In publish.yml
regression-tests:
  name: Regression Tests (Blocking Gate)
  uses: ./.github/workflows/regression-tests.yml
  needs: [prep, merge, test-smoke]
  with:
    image_ref: ${{ format('{0}/{1}:testing-{2}', env.REGISTRY, env.IMAGE_NAME_TESTING, github.sha) }}
  secrets: inherit

test-complete:
  needs:
    [
      test-smoke,
      test-extensions,
      test-features,
      test-security,
      regression-tests,
    ]
  # Production release blocked if any test fails

Benefits:

DRY: Single workflow definition used by publish AND manual runs
Blocking: Regression tests gate production releases
Flexible: Can be triggered manually for ad-hoc validation
Maintainable: Changes to regression tests automatically apply to all uses

Test Coverage by Workflow:

ci.yml: Fast tests only (4 Tier 1 tests, linting, validation)
publish.yml: Comprehensive test gate (smoke, extensions, features, security, regression)
regression-tests.yml: Tier 1-3 regression tests (production mode)

Test Environments

GitHub Actions workflows run all tests on:

Platform: linux/amd64, linux/arm64 (QEMU emulation for arm64 validation)
Extension kinds: compiled, pgdg, builtin
Auto-config scenarios: manual, cgroup, minimum, high-memory, 4GB, 8GB, 16GB, 64GB
Stack deployments: primary, replica, single
Failure scenarios: PgBouncer auth failures, connection limits, invalid configurations

Test Organization

Test Files:

scripts/test/
├── test-all-extensions-functional.ts  (all enabled extensions, comprehensive smoke tests)
├── test-auto-config.ts                (10 auto-config scenarios: 512MB-64GB)
├── test-pgbouncer-healthcheck.ts      (8 PgBouncer auth flow tests)
├── test-pgbouncer-failures.ts         (6 failure scenario tests)
├── test-hook-extensions.ts            (6 hook-based extension tests)
├── test-replica-stack.ts              (7-step replication validation)
├── test-single-stack.ts               (7-step standalone validation)
├── test-extensions.ts                 (legacy baseline tests)
├── test-extension-performance.ts      (performance benchmarks)
├── test-integration-extension-combinations.ts
└── test-pgq-functional.ts             (pgmq queue tests)

scripts/docker/
└── test-image.ts                      (comprehensive image test harness)

References

Extension tests: scripts/test/test-all-extensions-functional.ts
Auto-config tests: scripts/test/test-auto-config.ts (10 memory tier scenarios)
PgBouncer tests: scripts/test/test-pgbouncer-healthcheck.ts (happy path)
PgBouncer failure tests: scripts/test/test-pgbouncer-failures.ts (6 failure scenarios)
Hook extension tests: scripts/test/test-hook-extensions.ts
Replica stack tests: scripts/test/test-replica-stack.ts (replication validation)
Single stack tests: scripts/test/test-single-stack.ts (standalone validation)
Image test harness: scripts/docker/test-image.ts (comprehensive image validation)
Extension manifest: docker/postgres/extensions.manifest.json
Auto-config entrypoint: docker/postgres/docker-auto-config-entrypoint.sh
Commit 89de009: Test restoration with session isolation fixes (4 tests restored, 1 added)
Commit 11c4d56: pgsodium TCE security fix (enabled shared_preload_libraries)
Session Isolation: See scripts/test/test-all-extensions-functional.ts lines 450-520 for examples
Init Order: docker/postgres/docker-entrypoint-initdb.d/ for extension creation sequence
Hook Extensions: See AGENTS.md "Hook-Based Extensions & Tools" section for manifest patterns

Key Takeaway: When testing session-local PostgreSQL features (LOAD, SET, HypoPG, TEMP tables), always use multi-statement SQL blocks within a single runSQL() call. This preserves session state and prevents "feature not active" or "object not found" errors.

Status: Regression testing implemented. 100% extension coverage with functional smoke tests. All critical paths tested (extensions, auto-config, PgBouncer auth).

FilesExpand file tree

TESTING.md

Latest commit

History

TESTING.md

File metadata and controls

Testing Guide

Table of Contents

Smoke Test Pattern (Critical - Read This First!)

The Problem

Real-World Example

The Smoke Test Pattern

Required Elements Checklist

Common Pitfalls (Learn from These Mistakes)

Anti-Patterns to Avoid

Progressive Complexity Model

When to Stop and Investigate

Template for New Test Files

Key Takeaway

Regression Testing

Regression Test Suites

PostgreSQL Official Regression Tests (Tier 1)

Extension Regression Tests (Tier 2)

Expected Output Generation Status

pgTAP Regression Tests (Tier 4)

Session Isolation Pattern

Critical Concept

The Problem

The Solution

Real-World Examples

Example 1: auto_explain Plan Logging

Example 2: pg_plan_filter Query Execution

Example 3: HypoPG Hypothetical Indexes

Example 4: Session vs Persistent State

Testing Extension Functionality

Test Structure

Categories

Functional Testing Checklist

Testing Extensions with Dependencies

Common Pitfalls

1. Session State Lost

2. HypoPG Indexes Disappear

3. Output Parsing Errors

4. LOAD Commands Not Active

5. Replication Slot Already Exists

6. Temp Tables Not Found

7. Hook Extensions Not Working

8. Docker Credential Helper Not Found

9. Overriding Image Default Preload

Test Categories

Core Extensions (5)

Vector Search (2)

Full-Text Search (1)

Spatial (2)

Time-Series (2)

Observability (8)

Security (4)

Performance (6)

CDC (1)

Integration (1)

Safety (2)

Utilities (11)

Testing Strategy & Coverage

Current State

Test Coverage Matrix

Auto-Config Test Coverage

PgBouncer Test Coverage

Test Quality Metrics

Maintenance

Running Tests

Testing Released Images

Comprehensive Test Suite (All Validations + Build + Functional)

Extension Functional Tests

Filter by Category

Auto-Config Testing

PgBouncer Testing

Hook Extension Testing

Integration Testing

Comprehensive Image Testing

CI Integration