Setup and local development guide for the CLP Metastore Service.
- Docker and Docker Compose
- Go 1.25+
./docker/start.sh -dThis builds the CLP core package if needed, then starts all services (MariaDB, Kafka, MinIO, coordinator nodes).
Verify services are healthy:
docker compose -f docker/docker-compose.yml psAll services should report healthy status.
go build ./cmd/metalogProduces a metalog binary in the current directory.
Node is the main entry point. It hosts coordinators and workers in a single process.
# Run with default config path (/etc/clp/node.yaml)
./metalog serve
# Or specify a config file
./metalog serve --config config/node.yamlExpected output (zap structured logging):
{"level":"info","msg":"starting server","config":"/etc/clp/node.yaml"}
{"level":"info","msg":"database pool created","poolSize":5,"minIdle":2}
{"level":"info","msg":"storage registry created","defaultBackend":"minio"}
{"level":"info","msg":"node started","coordinators":1,"workers":4}
For a read-only query API node, configure only database.replica and enable grpc:
./metalog serve --config config/apiserver.yamlWhere apiserver.yaml sets database.replica (no primary) and enables grpc.query: true / grpc.metadata: true. See Deployment for details.
Tables are registered via the admin gRPC API. Register a table from the command line:
./metalog admin register-table \
--addr localhost:9090 \
--table clp_spark \
--display-name "Spark Logs" \
--kafka-topic spark-ir \
--kafka-bootstrap-servers localhost:9092This calls the coordinator's AdminService gRPC endpoint to UPSERT the table. Only the fields you specify are updated; omitted fields keep their database defaults.
Expected output:
table "clp_spark" created
Use docker/start.sh — it ensures the CLP binary package is built before starting.
# Start everything (detached)
./docker/start.sh -d
# Force a fresh CLP build, then start
./docker/start.sh --rebuild-clp -d
# Scale coordinator nodes
./docker/start.sh -d --scale coordinator-node=3
# View logs
docker compose -f docker/docker-compose.yml logs -f coordinator-node# Run all tests
go test ./...
# Run a specific test package
go test ./internal/coordinator/consolidation/...
# Run integration tests only (requires Docker)
go test -tags=integration ./internal/...| Variable | Default | Description |
|---|---|---|
CONFIG_PATH |
node.yaml |
Path to the YAML config file (or ConfigMap directory) |
HOSTNAME |
(OS hostname) | Used as node_id for table assignment (configurable via coordinator.nodeIdEnvVar) |
All other settings (database, Kafka, storage) are configured in the YAML file. The YAML supports ${VAR:-default} expansion for injecting secrets from the environment (see Configuration Reference).
Settings are organized by role: database (primary + optional replica), storage, grpc, health, coordinator, and worker. Per-table configuration (Kafka routing, feature flags) is managed via the admin gRPC API and stored in the database:
database:
primary:
host: localhost
port: 3306
database: metalog_metastore
user: root
password: password
poolSize: 5
storage:
defaultBackend: minio
backends:
minio:
endpoint: http://localhost:9000
accessKey: minioadmin
secretKey: minioadmin
bucket: logs
forcePathStyle: true
health:
enabled: true
port: 8081
coordinator:
enabled: true
nodeIdEnvVar: HOSTNAME # env var for _table_assignment.node_id
# Tables are registered via admin gRPC API (AdminService/RegisterTable).
# The coordinator discovers assigned tables from the DB on startup and via
# periodic reconciliation. See docs/guides/configure-tables.md.
# Shared worker pool (claims tasks from all tables)
worker:
concurrency: 4 # 0 = workers disabledSee Configuration Reference for full details.
Confirm the system is operational:
# Check database tables
docker compose -f docker/docker-compose.yml exec mariadb mariadb -uroot -ppassword metalog_metastore \
-e "SHOW TABLES;"
# Check Kafka topics
docker compose -f docker/docker-compose.yml exec kafka kafka-topics \
--bootstrap-server localhost:9092 --list
# Check MinIO buckets
docker compose -f docker/docker-compose.yml exec minio mc ls local/For a thorough automated check — HA fight-for-master, Kafka ingestion, and single-owner enforcement — use the E2E validation script:
./integration-tests/functional/coordinator/validate-e2e.shWhat it tests:
- Two coordinator nodes start and each gets a unique node ID (via
HOSTNAME) - Unassigned tables are claimed within seconds; no double-claims occur
- Kafka messages are ingested into the
clp_sparkmetadata table - Only one coordinator runs per table (verified via logs)
- Reconciliation: a table added after startup is picked up within seconds
Prerequisites: Docker, a built binary (go build ./cmd/metalog), and port 3307 free (or set DB_PORT).
Services not starting:
docker compose -f docker/docker-compose.yml logs mariadb kafka minioConnection refused errors:
- Ensure infrastructure is healthy:
docker compose -f docker/docker-compose.yml ps - Check ports are not in use:
lsof -i :3306 -i :9092 -i :9000
Tests failing:
- Ensure Docker is running (tests use testcontainers-go)
- Check for port conflicts with running infrastructure
- Tutorial: End-to-End Ingestion — Walk through the full Kafka ingestion pipeline
- Architecture Overview — Component design and data flow
- Configuration Reference — Detailed configuration reference
- Scale Workers — Scaling workers