Skip to content

ENG-3566: Refactor engine creation to use SQLAlchemy creator pattern#8148

Merged
erosselli merged 8 commits into
mainfrom
erosselli/ENG-3566-engine-refactor
May 11, 2026
Merged

ENG-3566: Refactor engine creation to use SQLAlchemy creator pattern#8148
erosselli merged 8 commits into
mainfrom
erosselli/ENG-3566-engine-refactor

Conversation

@erosselli
Copy link
Copy Markdown
Contributor

@erosselli erosselli commented May 8, 2026

Part of ENG-3566

Description Of Changes

Refactors all 6 database engines to use SQLAlchemy's creator pattern instead of baking credentials into connection URIs. The creator callable is invoked by the pool on every new connection, resolving credentials at connect time rather than engine construction time. This is the structural seam required for dynamic credential rotation via AWS Secrets Manager (design doc: #8016).

Today the creators read from static config (CONFIG.database). In follow-up work, the credential helpers will be swapped to call provider.get_secret() — no further engine changes needed.

Also updates the design doc to remove the lazy factory requirement, since the creator closure captures a provider reference (not credentials), making it independent of when the engine is constructed.

Code Changes

  • database_settings.py — Add raw_password / raw_readonly_password properties that reverse quote_plus escaping for direct psycopg2.connect() / asyncpg.connect() use
  • engine_creators.py (new) — make_sync_creator / make_async_creator factories that capture per-engine config (keepalives, SSL) in closures while resolving credentials per-connection. Includes credential helpers and asyncpg param/SSL conversion
  • session.py — Add optional creator parameter to get_db_engine(). When provided, uses dialect-only URL and skips connect_args (creator handles them)
  • ctl_session.py — Switch all 3 engines (async, readonly async, sync) from URI-based to creator-based. SSL context and asyncpg param handling moved into engine_creators.py
  • session_management.py — Switch API and readonly sync engines to use make_sync_creator with keepalive connect_args
  • tasks/__init__.py — Switch Celery task engine to use make_sync_creator with keepalive connect_args
  • dynamic-database-credentials.md — Remove lazy factory requirement from design doc

Steps to Confirm

  1. Verify all existing database connections work: hit GET /health/database and confirm all pools are healthy
  2. Run existing test suites to confirm no regressions (public API surface is unchanged)
  3. Verify that the async creator correctly wraps connections using SA internals (AsyncAdapt_asyncpg_connection)

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • Followup issues created
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

Switch all 6 database engines from URI-based connection creation to the
creator pattern. The creator callable is invoked by the pool on every new
connection, resolving credentials at connect time rather than engine
construction time. This is the structural seam that enables dynamic
credential rotation via AWS Secrets Manager in follow-up work.

Changes:
- Add raw_password/raw_readonly_password properties to DatabaseSettings
  for unescaped passwords (needed by psycopg2/asyncpg direct connect)
- New engine_creators.py with make_sync_creator/make_async_creator
  factories and credential helpers
- Add creator parameter to get_db_engine() in session.py
- Switch ctl_session.py async/sync engines to creator pattern
- Switch session_management.py sync engines to creator pattern
- Switch tasks/__init__.py task engine to creator pattern
- Update design doc to remove lazy factory requirement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview May 11, 2026 9:17pm
fides-privacy-center Ignored Ignored May 11, 2026 9:17pm

Request Review

@erosselli erosselli added the run unsafe ci checks Runs fides-related CI checks that require sensitive credentials label May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 93.68421% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.45%. Comparing base (26cf81d) to head (37ce205).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/fides/common/engine_creators.py 89.83% 3 Missing and 3 partials ⚠️

❌ Your patch check has failed because the patch coverage (93.68%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8148      +/-   ##
==========================================
+ Coverage   85.40%   85.45%   +0.04%     
==========================================
  Files         649      650       +1     
  Lines       42283    42363      +80     
  Branches     4960     4971      +11     
==========================================
+ Hits        36112    36201      +89     
+ Misses       5063     5054       -9     
  Partials     1108     1108              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

erosselli and others added 2 commits May 8, 2026 17:08
25 tests covering credential helpers, raw_password round-trip,
asyncpg param conversion, SSL context building, and end-to-end
sync/async engine creation via creator pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix: async_params.pop("ssl") when ssl_context exists, preventing
  kw.update(async_params) from replacing the SSLContext with a raw string
- Add ValueError when creator and keepalives params are both passed to
  get_db_engine()
- Add SSL success-path test with self-signed cert
- Add get_db_engine creator= path and keepalives error tests
- Remove unused psycopg2 import from test file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erosselli erosselli marked this pull request as ready for review May 11, 2026 14:44
@erosselli erosselli requested a review from a team as a code owner May 11, 2026 14:44
@erosselli erosselli requested review from adamsachs and removed request for a team May 11, 2026 14:44
Copy link
Copy Markdown
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, this looks good to me! nice job carving out a clean abstraction/hook for this.

a couple of nits and then my claude review does have one callout that i think is worth looking into, will post that as a quick followup

Comment thread src/fides/common/engine_creators.py
Comment on lines +2 to +11
SQLAlchemy engine ``creator`` callables for dynamic credential resolution.

The ``creator`` pattern passes a callable to ``create_engine`` /
``create_async_engine`` instead of a connection URI. SQLAlchemy calls the
creator every time the pool needs a new connection, so credentials are
resolved at **connection time** rather than engine construction time.

Today the credential helpers read from static config (``CONFIG.database``).
A future secret-provider integration will swap them to call
``provider.get_secret()`` — the rest of the engine code stays the same.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice documentation here.

nit: maybe worth mentioning that they must stay lightweight? (or at least i assume so 😄)

Comment on lines +279 to +288
def raw_password(self) -> str:
"""Return password unescaped for direct driver use (psycopg2/asyncpg)."""
return unquote_plus(self.password)

@property
def raw_readonly_password(self) -> Optional[str]:
"""Return readonly password unescaped for direct driver use."""
if self.readonly_password:
return unquote_plus(self.readonly_password)
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, claude is calling something out here and i think it's valid? in that we should probably at least call this out as a change to how we expect encoded passwords to be handled...

will post directly from my claude review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude]: escape_password runs quote_plus unconditionally, and raw_password reverses it with unquote_plus. For passwords containing literal @, #, %, , etc., this round-trips correctly — the parametrized tests confirm that.

But the previous URI-based code path treated password as the already-escaped value to be embedded in a URI. If a deployment has historically set FIDES__DATABASE__PASSWORD=foo%40bar (i.e., the user manually URL-encoded @ as %40), the old URI path used foo%40bar in the URI, which psycopg2/asyncpg decode to foo@bar at parse time — correct.

With the new creator path, that same value flows through:

  1. escape_password runs quote_plus("foo%40bar")"foo%2540bar" (the % itself gets escaped).
  2. raw_password runs unquote_plus("foo%2540bar")"foo%40bar" (the literal string).
  3. psycopg2.connect(password="foo%40bar") sends foo%40bar as the password, not foo@bar.

This is a silent behavior change for any user who was already pre-encoding their password in env vars or config files. It will only manifest as auth failures in production.

Options:

  • Document this clearly as a breaking change for pre-encoded passwords, OR
  • Make escape_password idempotent (e.g., detect already-encoded values), OR
  • Add a release note + migration step.

At minimum, add a test that documents the round-trip behavior for a password literally containing %XX sequences so the contract is explicit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a behavior change. Both the old URI path and the new creator path produce the same result for all cases, including the pre-encoded foo%40bar scenario.

escape_password has always run quote_plus unconditionally on the raw config value. PostgresDsn.build() does not re-encode, so the old URI path was:

  1. escape_password("foo%40bar")"foo%2540bar" (stored)
  2. PostgresDsn.build(password="foo%2540bar") → embeds as-is in URI
  3. psycopg2 parses URI, URL-decodes → "foo%40bar" sent to Postgres

The new creator path:

  1. Same escape_password"foo%2540bar" (stored)
  2. raw_passwordunquote_plus("foo%2540bar")"foo%40bar"
  3. psycopg2.connect(password="foo%40bar")"foo%40bar" sent to Postgres

Same result in both paths. The pre-encoding case was already "broken" (or rather: passwords are expected to be raw, not pre-encoded). Will add a test documenting this contract.

…t, mutual exclusion guard

- Add note to engine_creators.py docstring that creators must stay lightweight
- Add test documenting that pre-encoded passwords are treated as literals
- Add ValueError when creator is passed with database_uri or config
- Add tests for creator + database_uri/config mutual exclusion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erosselli erosselli added this pull request to the merge queue May 11, 2026
Merged via the queue into main with commit 545fd9a May 11, 2026
67 of 69 checks passed
@erosselli erosselli deleted the erosselli/ENG-3566-engine-refactor branch May 11, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run unsafe ci checks Runs fides-related CI checks that require sensitive credentials

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants