Skip to content

feat: add API client, auth, and backend abstraction layer#127

Open
lewisjared wants to merge 13 commits intomainfrom
remove-s3
Open

feat: add API client, auth, and backend abstraction layer#127
lewisjared wants to merge 13 commits intomainfrom
remove-s3

Conversation

@lewisjared
Copy link
Contributor

@lewisjared lewisjared commented Feb 6, 2026

Description

Adds REST API support to the bookshelf consumer package as the foundation for migrating away from direct S3 access. This PR includes:

API Client & Authentication

  • BookshelfAPIClient HTTP client wrapping the bookshelf-platform REST API
  • WorkOS OAuth authentication (Authorization Code + PKCE and Device Code flows)
  • Credential storage with automatic token refresh
  • CLI commands for auth management (login, logout, status) and API browsing (volumes list/show, books list/show)

Pydantic v2 Migration (Phase 0.5)

  • Migrated all 9 schema models from pydantic v1 compatibility mode to pydantic v2 native
  • Updated .dict().model_dump(), .json().model_dump_json() across core library and notebooks

Backend Abstraction Layer (Phase 1)

  • BookshelfBackend protocol with 5 semantic operations (resolve_version, list_versions, fetch_datapackage, download_resource, list_volumes)
  • S3Backend extracting current S3 logic from shelf.py
  • APIBackend wrapping BookshelfAPIClient with response mapping
  • ResourceSummary extended with optional download fields for future API-driven resource access

Wire Backend into BookShelf (Phase 2)

  • BookShelf now delegates all data operations to self._backend (S3Backend by default)
  • LocalBook supports download_url for direct resource downloads via API backend
  • shelf.py refactored: removed inline S3 logic, all access goes through backend protocol
  • BOOKSHELF_BACKEND=api env var auto-selects APIBackend (with BOOKSHELF_API_URL and BOOKSHELF_TOKEN)
  • Cache-first gate: zero network calls when version+edition are pinned and data is cached locally
  • Offline fallback: graceful degradation to cached data on ConnectionError/OSError, with OfflineError when no cache exists
  • _find_cached_versions() scans local cache for fallback candidates
  • 29 backend unit tests (10 S3Backend + 19 APIBackend)
  • 12 shelf tests (env var selection, offline fallback, cache helpers)

Test Fixes

  • Fixed 3 login tests to mock WorkOS OAuth flow instead of deprecated /auth/token endpoint

All existing tests pass with zero regressions (203 passed).

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

Add comprehensive support for the new bookshelf-platform API:

- API client module (bookshelf.api):
  - BookshelfAPIClient for volumes, books, and authentication
  - Pydantic schemas for request/response validation
  - Custom error types (APIError, AuthenticationError, NotFoundError, ServerError)

- Authentication module (bookshelf.auth):
  - Credential storage using platformdirs
  - Token expiry management
  - Environment variable overrides (BOOKSHELF_API_URL, BOOKSHELF_TOKEN)

- CLI tool (bookshelf-client):
  - auth commands: login, logout, status
  - volumes commands: list, show
  - books commands: list, show
  - Rich terminal output with tables and panels

- 76 unit tests covering all new functionality

Dependencies added: httpx, click, rich, respx (dev)
Remove authentication requirement from volumes and books CLI commands
to allow unauthenticated access to public datasets. Auth is still
available and will be used if credentials exist.

Also fix uv deprecation warning by migrating from tool.uv.dev-dependencies
to dependency-groups.dev in all pyproject.toml files.
- Set default API URL to https://api.staging.climateresource.com.au/bookshelf/v1
- Remove /api prefix from endpoint paths (base URL includes full path)
- Update all tests to match new URL structure
Migrate CLI authentication from username/password (POST /auth/token)
to WorkOS OAuth with PKCE and Device Code flows. The old endpoint
no longer exists after the platform migrated to WorkOS.

- Add oauth.py with PKCE, Device Code, and token refresh flows
- Extend Credentials with refresh_token, add auto-refresh on expiry
- Replace login command: --device-code flag for headless environments
- Add on_token_refresh callback to API client with 401 retry
- Add refresh_token to TokenResponse schema
WorkOS requires a connection selector (connection_id, organization_id,
or provider) in the authorize request. Without it, the flow fails with
invalid_connection_selector.
Upgrade pydantic dependency from >=1.10.17 to >=2.0 and migrate all
schema code to use native v2 APIs. This eliminates the pydantic.v1
compatibility layer and prepares the codebase for the API backend
abstraction layer.

Changes:
- Update pydantic import from pydantic.v1 to pydantic in schema.py
- Add explicit `= None` defaults to optional fields (url, doi,
  description) that were implicitly optional in v1 but required in v2
- Replace .dict() with .model_dump() across notebook.py, 15 notebooks,
  and docs
- Replace .json() with .model_dump_json() in actions.py
- Update test imports to use pydantic v2 ValidationError
…rces

Introduce BookshelfBackend protocol with semantic operations
(resolve_version, list_versions, fetch_datapackage, download_resource,
list_volumes) and two implementations:

- S3Backend: extracts existing S3 logic from shelf.py
- APIBackend: wraps BookshelfAPIClient with response mapping

Also extends ResourceSummary with optional download fields
(download_url, hash, filename, timeseries_name, shape, content_hash)
for future API-driven resource access.
The login command now uses WorkOS OAuth (authorization_code_flow)
instead of username/password POST to /auth/token. Update all three
login tests to mock the OAuth flow via monkeypatch instead of
mocking HTTP endpoints with respx.
…nd tests

BookShelf now delegates to self._backend (defaults to S3Backend for
backward compatibility). shelf.py no longer imports requests.exceptions
directly - error handling lives inside each backend. LocalBook.timeseries()
and get_long_format_data() prefer download_url from resource descriptor,
falling back to S3 URL construction.

Adds 29 unit tests for S3Backend (10) and APIBackend (19) covering
resolve_version, list_versions, fetch_datapackage, download_resource,
list_volumes, private derivation from EditionInfo.status, and error mapping.
…e fallback

Complete Phase 2 TODOs 2.7-2.10:
- BOOKSHELF_BACKEND=api env var auto-selects APIBackend
- OfflineError exception for clear offline diagnostics
- Graceful fallback to cached data on ConnectionError/OSError
- _find_cached_versions() scans local cache for fallback candidates
- 12 new tests covering env var selection and offline behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant