Skip to content

Commit bcaf064

Browse files
authored
Add comprehensive loader implementation guide (#36)
- Create NEW_LOADER_GUIDE.md with step-by-step instructions, code templates, type mapping reference, and validation checklist - Update CLAUDE.md with quick reference, correct method names, and links to the detailed guide - Document all 7 existing loaders with their characteristics This enables agents to implement new loaders without reverse-engineering existing implementations.
1 parent aa05164 commit bcaf064

File tree

2 files changed

+674
-10
lines changed

2 files changed

+674
-10
lines changed

CLAUDE.md

Lines changed: 61 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -59,17 +59,68 @@ uv run ruff check . --fix
5959

6060
### Data Loader System
6161
The core architecture follows a plugin-based loader system with:
62-
- Abstract `DataLoader` base class in `src/amp/loaders/base.py`
63-
- Auto-discovery mechanism for loaders via `__init_subclass__`
62+
- Generic `DataLoader[TConfig]` base class in `src/amp/loaders/base.py`
63+
- Auto-discovery via registry in `src/amp/loaders/registry.py`
6464
- Zero-copy operations using PyArrow for performance
65-
- Connection management with named connections and environment variables
66-
67-
When implementing new loaders:
68-
1. Inherit from `DataLoader` base class
69-
2. Implement required methods: `connect()`, `load_table()`, `close()`
70-
3. Define configuration schema using dataclasses
71-
4. Register supported data types in class attributes
72-
5. Follow existing patterns from PostgreSQL and Redis loaders
65+
- Built-in resilience (retry, backpressure), state management, and reorg handling
66+
67+
**For detailed implementation instructions, see `src/amp/loaders/NEW_LOADER_GUIDE.md`**
68+
69+
#### Quick Reference: Implementing New Loaders
70+
71+
**Files to create:**
72+
1. `src/amp/loaders/implementations/xxx_loader.py` - Main implementation
73+
2. `tests/integration/loaders/backends/test_xxx.py` - Integration tests
74+
75+
**Files to modify:**
76+
1. `src/amp/loaders/implementations/__init__.py` - Add import
77+
2. `tests/conftest.py` - Add testcontainer and config fixtures
78+
79+
**Required implementation:**
80+
```python
81+
@dataclass
82+
class XxxConfig:
83+
host: str = 'localhost'
84+
port: int = 1234
85+
# ... connection settings
86+
87+
class XxxLoader(DataLoader[XxxConfig]):
88+
SUPPORTED_MODES = {LoadMode.APPEND}
89+
SUPPORTS_TRANSACTIONS = False
90+
91+
def connect(self) -> None: ...
92+
def disconnect(self) -> None: ...
93+
def _load_batch_impl(self, batch, table_name, **kwargs) -> int: ...
94+
def _create_table_from_schema(self, schema, table_name) -> None: ...
95+
def table_exists(self, table_name) -> bool: ...
96+
```
97+
98+
**Test implementation:**
99+
```python
100+
class XxxTestConfig(LoaderTestConfig):
101+
loader_class = XxxLoader
102+
config_fixture_name = 'xxx_test_config'
103+
104+
def get_row_count(self, loader, table_name) -> int: ...
105+
def query_rows(self, loader, table_name, where, order_by) -> List[Dict]: ...
106+
def cleanup_table(self, loader, table_name) -> None: ...
107+
def get_column_names(self, loader, table_name) -> List[str]: ...
108+
109+
class TestXxxCore(BaseLoaderTests):
110+
config = XxxTestConfig() # Inherits 6 generalized tests
111+
112+
class TestXxxStreaming(BaseStreamingTests):
113+
config = XxxTestConfig() # Inherits 5 streaming tests
114+
```
115+
116+
#### Existing Loaders (for reference)
117+
- **ClickHouse**: OLAP, columnar, no transactions - `clickhouse_loader.py`
118+
- **PostgreSQL**: OLTP, connection pooling, transactions - `postgresql_loader.py`
119+
- **Redis**: Key-value, multiple data structures - `redis_loader.py`
120+
- **Snowflake**: Cloud warehouse - `snowflake_loader.py`
121+
- **DeltaLake**: File-based, ACID transactions - `deltalake_loader.py`
122+
- **Iceberg**: Catalog-based, partitioned tables - `iceberg_loader.py`
123+
- **LMDB**: Embedded, memory-mapped - `lmdb_loader.py`
73124

74125
### Testing Strategy
75126
- **Unit tests**: Test pure logic and data structures WITHOUT mocking. Unit tests should be simple, fast, and test isolated components (dataclasses, utility functions, partitioning logic, etc.). Do NOT add tests that require mocking to `tests/unit/`.

0 commit comments

Comments
 (0)