Skip to content

Commit 2e548c1

Browse files
codybromburmecia
andauthored
feat(openapi): upgrade OpenAPI FDW to v0.2.0 with modular architecture (#573)
* feat(openapi): upgrade to v0.2.0 with modular architecture Replace v0.1.4 monolithic codebase with v0.2.0 refactored modules: config, request, response, pagination, column_matching, spec, schema. New features: POST-for-read endpoints, spec_json inline specs, LIMIT-to-page_size pushdown, api_key_location (query/cookie), debug mode, max_pages/max_response_bytes safety limits, OpenAPI 3.1 support. Includes 518 unit tests, benchmarks, 5 real-world examples (NWS, CarAPI, PokeAPI, GitHub, Threads), Docker-based integration test infrastructure with 113 assertions, and performance analysis docs. * docs(openapi): rewrite README and fix catalog docs Rewrite README with clearer features list, honest performance section comparing FDW vs pg_http (DX tradeoff with SQL examples and end-to-end benchmarks), and move limitations up for visibility. Consolidate PERFORMANCE.md into README. Update benchmark script to measure full read-to-write lifecycle (INSERT INTO) instead of PERFORM. Fix tabbed content indentation in catalog docs for pymdownx.tabbed rendering. * chore(openapi): revert workspace release profile and update README Remove [profile.release] (strip, lto) from the shared wasm-wrappers workspace Cargo.toml — these affect all wasm FDWs, not just openapi. Revert Cargo.lock to match main. Minor README updates. * chore(openapi): remove criterion micro-benchmarks Remove benches/fdw_benchmarks.rs and the criterion dev-dependency. These benchmarks tested re-implemented copies of FDW logic rather than actual code, added ~38 transitive dependencies, and caused build errors on wasm targets. The SQL-level benchmark script (test/benchmark.sh) provides meaningful end-to-end performance analysis. * fix(openapi): use native target for test and clippy in Makefile Unit tests and clippy can't run on wasm32-unknown-unknown since there's no runtime to execute the binary. Auto-detect the host target via rustc so make test and make clippy work out of the box on any platform. * feat(openapi): add YAML spec support, review fixes, and example improvements Add YAML spec parsing via serde_yaml_ng so spec_url accepts both JSON and YAML OpenAPI specs. Many APIs only publish YAML, so this makes the FDW work out of the box with more APIs. Also addresses PR review items: - Replace deprecated serde_yaml with serde_yaml_ng - debug_assert! -> assert! in this_mut() for release safety - Header deduplication prevents duplicate content-type/authorization - Empty/whitespace credentials filtered with warning - Retry on 502/503 in addition to 429, with status-specific hints - RowsOut stats now count rows consumed by PG, not just fetched - Validate max_pages >= 1 - base_url validation for spec-derived server URLs - Improved error messages (show both JSON and YAML parse errors) Example updates: - All 5 examples get IMPORT FOREIGN SCHEMA as section 1 - New import servers with spec_url (or spec_json for Threads) - Threads example shows CREATE SERVER with inline spec_json - PokeAPI highlights YAML spec support * fix(openapi): address PR #573 review feedback from Copilot and CodeRabbit - Fix pagination URL resolution for parameterized endpoints by storing resolved_endpoint after path param substitution - Fix absolute-path pagination to use origin-only base to avoid duplicating path prefixes (e.g. /v1/v1/items) - Map time and byte/binary formats to text (WIT TypeOid has no time/bytea variants) - Fix .env.example copy instructions and README test counts - Replace {checksum} placeholder with descriptive text in catalog docs * docs(openapi): consolidate feature tables and add missing import server blocks Move per-example feature tables into a single Feature Coverage comparison on the main examples README. Add the missing CREATE SERVER blocks for spec_url/spec_json import servers to carapi, pokeapi, github, and nws examples. Rename NWS references to Weather.gov. * docs(openapi): fix attrs column description to match actual behavior The attrs column returns the full JSON response object, not just unmapped fields. Updated all example READMEs to accurately describe this, consistent with the catalog docs and every other wasm FDW. * minor code improvements * add default user agent header --------- Co-authored-by: Cody Bromley <codybrom@users.noreply.github.com> Co-authored-by: Bo Lu <lv.patrick@gmail.com>
1 parent 57e2c42 commit 2e548c1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+21963
-1455
lines changed

docs/catalog/openapi.md

Lines changed: 119 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,14 @@ tags:
1212

1313
[OpenAPI](https://www.openapis.org/) is a specification for describing HTTP APIs. The OpenAPI Wrapper is a generic WebAssembly (Wasm) foreign data wrapper that can connect to any REST API with an OpenAPI 3.0+ specification.
1414

15-
This wrapper allows you to query any REST API endpoint as a PostgreSQL foreign table, with support for path parameters, pagination, and automatic schema import.
15+
This wrapper allows you to query any REST API endpoint as a PostgreSQL foreign table, with support for path parameters, pagination, POST-for-read endpoints, and automatic schema import.
1616

1717
## Available Versions
1818

1919
| Version | Wasm Package URL | Checksum | Required Wrappers Version |
2020
| ------- | ---------------- | -------- | ------------------------- |
21-
| 0.1.4 | `https://github.com/supabase/wrappers/releases/download/wasm_openapi_fdw_v0.1.4/openapi_fdw.wasm` | `dd434f8565b060b181d1e69e1e4d5c8b9c3ac5ca444056d3c2fb939038d308fe` | >=0.5.0 |
21+
| 0.2.0 | `https://github.com/supabase/wrappers/releases/download/wasm_openapi_fdw_v0.2.0/openapi_fdw.wasm` | _published on release_ | >=0.5.0 |
22+
| 0.1.4 | `https://github.com/supabase/wrappers/releases/download/wasm_openapi_fdw_v0.1.4/openapi_fdw.wasm` | `dd434f8565b060b181d1e69e1e4d5c8b9c3ac5ca444056d3c2fb939038d308fe` | >=0.5.0 |
2223

2324
## Preparation
2425

@@ -94,12 +95,14 @@ We need to provide Postgres with the credentials to access the API and any addit
9495
| Option | Required | Description |
9596
| ------ | :------: | ----------- |
9697
| `fdw_package_*` | Yes | Standard Wasm FDW package metadata. See [Available Versions](#available-versions). |
97-
| `base_url` | Yes* | Base URL for the API (e.g., `https://api.example.com/v1`). *Optional if `spec_url` provides servers. |
98-
| `spec_url` | No | URL to the OpenAPI specification JSON. Required for `IMPORT FOREIGN SCHEMA`. |
98+
| `base_url` | Yes* | Base URL for the API (e.g., `https://api.example.com/v1`). *Optional if `spec_url` or `spec_json` provides servers. |
99+
| `spec_url` | No | URL to the OpenAPI specification (JSON or YAML). Required for `IMPORT FOREIGN SCHEMA`. Mutually exclusive with `spec_json`. |
100+
| `spec_json` | No | Inline OpenAPI 3.0+ JSON spec for `IMPORT FOREIGN SCHEMA`. Mutually exclusive with `spec_url`. Useful when the API doesn't publish a spec URL. |
99101
| `api_key` | No | API key for authentication. |
100102
| `api_key_id` | No | Vault secret key ID storing the API key. Use instead of `api_key`. |
101103
| `api_key_header` | No | Header name for API key (default: `Authorization`). |
102104
| `api_key_prefix` | No | Prefix for API key value (default: `Bearer` for Authorization header). |
105+
| `api_key_location` | No | Where to send the API key: `header` (default), `query`, or `cookie`. |
103106
| `bearer_token` | No | Bearer token for authentication (alternative to `api_key`). |
104107
| `bearer_token_id` | No | Vault secret key ID storing the bearer token. |
105108
| `user_agent` | No | Custom User-Agent header value. |
@@ -109,6 +112,9 @@ We need to provide Postgres with the credentials to access the API and any addit
109112
| `page_size` | No | Default page size for pagination (0 = no automatic limit). |
110113
| `page_size_param` | No | Query parameter name for page size (default: `limit`). |
111114
| `cursor_param` | No | Query parameter name for pagination cursor (default: `after`). |
115+
| `max_pages` | No | Maximum pages per scan to prevent infinite pagination loops (default: `1000`). |
116+
| `max_response_bytes` | No | Maximum response body size in bytes (default: `52428800` / 50 MiB). |
117+
| `debug` | No | Emit HTTP request details and scan stats via PostgreSQL INFO messages when set to `'true'` or `'1'`. |
112118

113119
### Create a schema
114120

@@ -151,10 +157,12 @@ options (
151157
| `cursor_param` | No | Override server-level cursor parameter name. |
152158
| `page_size_param` | No | Override server-level page size parameter name. |
153159
| `page_size` | No | Override server-level page size. |
160+
| `method` | No | HTTP method for this endpoint. Use `POST` for read-via-POST endpoints (default: `GET`). |
161+
| `request_body` | No | Request body string for POST endpoints. |
154162

155163
### Automatic Schema Import
156164

157-
If you provide a `spec_url` in the server options, you can automatically import table definitions:
165+
If you provide a `spec_url` or `spec_json` in the server options, you can automatically import table definitions:
158166

159167
```sql
160168
-- Import all endpoints
@@ -244,12 +252,67 @@ select * from openapi.users where status = 'active';
244252

245253
Columns used as query or path parameters always return the value from the WHERE clause, even if the API response contains the same field with different casing. This ensures PostgreSQL's post-filter always passes.
246254

255+
### LIMIT Pushdown
256+
257+
When your query includes a `LIMIT`, the FDW uses it as the `page_size` for the first API request, reducing unnecessary data transfer:
258+
259+
```sql
260+
-- Sends GET /users?limit=5 (uses LIMIT as page_size)
261+
select * from openapi.users limit 5;
262+
```
263+
264+
## POST-for-Read Endpoints
265+
266+
Some APIs use POST requests for read operations (e.g., search or query endpoints). Use the `method` and `request_body` table options:
267+
268+
```sql
269+
create foreign table openapi.search_results (
270+
id text,
271+
title text,
272+
score real,
273+
attrs jsonb
274+
)
275+
server my_api_server
276+
options (
277+
endpoint '/search',
278+
method 'POST',
279+
request_body '{"query": "openapi", "limit": 50}'
280+
);
281+
282+
select id, title, score from openapi.search_results;
283+
```
284+
285+
## Debug Mode
286+
287+
Enable debug mode to see HTTP request details and scan statistics in PostgreSQL INFO messages:
288+
289+
```sql
290+
create server debug_api
291+
foreign data wrapper wasm_wrapper
292+
options (
293+
fdw_package_name 'supabase:openapi-fdw',
294+
fdw_package_url '{See: "Available Versions"}',
295+
fdw_package_checksum '{See: "Available Versions"}',
296+
fdw_package_version '{See: "Available Versions"}',
297+
base_url 'https://api.example.com',
298+
debug 'true'
299+
);
300+
```
301+
302+
Debug output includes:
303+
304+
- HTTP method and URL for each request
305+
- Response status code and body size
306+
- Total rows fetched and pages retrieved
307+
- Pagination details
308+
247309
## Pagination
248310

249311
The FDW automatically handles pagination. It supports:
250312

251313
1. **Cursor-based pagination** - Uses `cursor_param` and `cursor_path`
252314
2. **URL-based pagination** - Follows `next` links in response
315+
3. **Offset-based pagination** - Auto-detected from common patterns
253316

254317
### Configuring Pagination
255318

@@ -306,18 +369,20 @@ options (
306369
| ------------- | --------- |
307370
| text | string |
308371
| boolean | boolean |
309-
| smallint | number |
372+
| smallint* | number |
310373
| integer | number |
311374
| bigint | number |
312375
| real | number |
313376
| double precision | number |
314-
| numeric | number |
377+
| numeric* | number |
315378
| date | string (ISO 8601) |
316-
| timestamp | string (ISO 8601) |
379+
| timestamp* | string (ISO 8601) |
317380
| timestamptz | string (ISO 8601) |
318381
| jsonb | object/array |
319382
| uuid | string |
320383

384+
\* Types marked with an asterisk work when you define tables manually, but `IMPORT FOREIGN SCHEMA` won't generate columns with these types automatically.
385+
321386
### The `attrs` Column
322387

323388
Any foreign table can include an `attrs` column of type `jsonb` to capture the entire raw JSON response for each row:
@@ -339,29 +404,31 @@ options (endpoint '/users');
339404
- **Authentication**: Currently supports API Key and Bearer Token authentication. OAuth flows are not supported.
340405
- **OpenAPI version**: Only OpenAPI 3.0+ specifications are supported (not Swagger 2.0).
341406

342-
## Rate Limiting
407+
## Automatic Retries
343408

344-
The FDW automatically handles rate limiting:
409+
The FDW automatically retries transient HTTP errors up to 3 times:
345410

346-
- **HTTP 429 responses**: Automatically retries up to 3 times
411+
- **HTTP 429** (Rate Limit), **502** (Bad Gateway), **503** (Service Unavailable)
347412
- **Retry-After header**: Respects server-specified delay when provided
348413
- **Exponential backoff**: Falls back to 1s, 2s, 4s delays when no Retry-After header is present
349414

350415
For APIs with very strict rate limits, consider using materialized views to cache results.
351416

352417
## Examples
353418

419+
For additional real-world examples with multiple tables, pagination, and advanced features, see the **[examples directory on GitHub](https://github.com/supabase/wrappers/tree/main/wasm-wrappers/fdw/openapi_fdw/examples)**. There are step-by-step walkthroughs for querying the NWS Weather API, PokéAPI, CarAPI, GitHub, and Threads.
420+
354421
### Basic Query
355422

356423
```sql
357424
-- Create a foreign server connecting to the Weather.gov API
358425
create server openapi_server
359426
foreign data wrapper wasm_wrapper
360427
options (
361-
fdw_package_url 'https://github.com/supabase/wrappers/releases/download/wasm_openapi_fdw_v0.1.4/openapi_fdw.wasm',
428+
fdw_package_url 'https://github.com/supabase/wrappers/releases/download/wasm_openapi_fdw_v0.2.0/openapi_fdw.wasm',
362429
fdw_package_name 'supabase:openapi-fdw',
363-
fdw_package_version '0.1.4',
364-
fdw_package_checksum 'dd434f8565b060b181d1e69e1e4d5c8b9c3ac5ca444056d3c2fb939038d308fe',
430+
fdw_package_version '0.2.0',
431+
fdw_package_checksum '{See: "Available Versions"}',
365432
base_url 'https://api.weather.gov',
366433
spec_url 'https://api.weather.gov/openapi.json'
367434
);
@@ -395,6 +462,26 @@ options (
395462
select id, type from openapi.zone_stations where zone_id = 'AKZ317';
396463
```
397464

465+
### POST-for-Read
466+
467+
```sql
468+
-- Query a search API that uses POST for read operations
469+
create foreign table openapi.search_results (
470+
id text,
471+
title text,
472+
score real,
473+
attrs jsonb
474+
)
475+
server my_api_server
476+
options (
477+
endpoint '/search',
478+
method 'POST',
479+
request_body '{"query": "postgresql", "limit": 25}'
480+
);
481+
482+
select id, title, score from openapi.search_results;
483+
```
484+
398485
### Custom Headers
399486

400487
```sql
@@ -413,6 +500,24 @@ create server custom_api
413500
);
414501
```
415502

503+
### API Key Location
504+
505+
By default, the API key is sent as a header. Use `api_key_location` to send it as a query parameter or cookie instead:
506+
507+
```sql
508+
create server query_auth_api
509+
foreign data wrapper wasm_wrapper
510+
options (
511+
fdw_package_name 'supabase:openapi-fdw',
512+
fdw_package_url '{See: "Available Versions"}',
513+
fdw_package_checksum '{See: "Available Versions"}',
514+
fdw_package_version '{See: "Available Versions"}',
515+
base_url 'https://api.example.com',
516+
api_key 'sk-your-api-key',
517+
api_key_location 'query' -- sends as ?api_key=sk-... (uses api_key_header as param name)
518+
);
519+
```
520+
416521
### Response Path Extraction
417522

418523
For APIs that wrap data in a container object:

0 commit comments

Comments
 (0)