-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Problem
When a CDC consumer is started for the first time against a table (empty or with very few records), it takes a very long time to begin delivering changes and generates excessive load on the Scylla cluster. Users report this even for tables with only ~20 rows.
Root Cause Analysis
The root cause is that on first startup (no saved state), the library begins reading from the very first CDC generation start timestamp rather than from "now" or a recent point in time. This forces it to iterate through every 30-second time window from the generation start to the present, issuing a CQL query for each window — even when all results are empty.
Detailed breakdown
1. Initial state always starts from generation beginning
TaskState.createInitialFor() (TaskState.java:92-95) sets the read position to the generation start:
public static TaskState createInitialFor(GenerationId generation, long windowSizeMs) {
Timestamp generationStart = generation.getGenerationStart();
return new TaskState(generationStart, generationStart.plus(windowSizeMs, ChronoUnit.MILLIS), Optional.empty());
}2. First generation fetched from epoch
BaseMasterCQL.fetchFirstGenerationId() (BaseMasterCQL.java:29-31) queries for the earliest generation after Date(0) (Unix epoch):
public CompletableFuture<Optional<GenerationId>> fetchFirstGenerationId() {
return fetchSmallestGenerationAfter(new Date(0))
.thenApply(opt -> opt.map(t -> new GenerationId(new Timestamp(t))));
}If the cluster has existed for a long time, the first generation could be weeks or months old — even if the table the user cares about was created recently.
3. Massive number of empty window queries
With the default 30-second window size (WorkerConfiguration.DEFAULT_QUERY_TIME_WINDOW_SIZE_MS = 30000), the number of windows to scan is:
| Generation age | Windows per task |
|---|---|
| 1 day | 2,880 |
| 7 days | 20,160 |
| 30 days | 86,400 |
Each window requires a CQL query to the CDC log table. This is multiplied by the number of vnodes (streams) in the generation, which can be 128-256+ on a multi-node cluster, times the number of tables being consumed.
Example: A 3-node cluster with a generation 7 days old and 128 vnodes consuming 1 table:
20,160 windows × 128 tasks = ~2.6 million queriesjust to catch up, all returning empty results.
4. TTL-based trimming is insufficient
The library does attempt to trim based on table TTL (Worker.java:69-76), but:
- If no TTL is set on the CDC log table,
minimumWindowStartfalls back toDate(0)— no trimming at all - Even with the default 24-hour CDC TTL, that's still 2,880 windows × number-of-tasks
- The TTL check only trims if the entire current window is before
(now - TTL), so partial overlap keeps the old start
5. Multiple generations compound the problem
If the cluster has been through topology changes (node additions/removals), there will be multiple generations. The master loop (GenerationBasedCDCMetadataModel.java:123-140) processes each generation sequentially. For each, it fetches all stream metadata, creates tasks, and waits for workers to fully consume them before moving to the next.
The generationTTLExpired() check (GenerationBasedCDCMetadataModel.java:59-93) can skip old generations, but only if all tables have a TTL set. One table without TTL prevents skipping.
Impact
- High cluster load: Burst of millions of empty CQL queries hitting the cluster
- Long startup time: Minutes to hours before the consumer reaches "live" data
- Wasted resources: All queries return empty results for empty/new tables
- Poor first-time experience: Users setting up CDC for the first time see unexplained delays
Suggested Improvements
-
Add a "start from now" option: Allow users to configure the consumer to start from the current timestamp instead of the generation start. This is the most common use case for first-time consumers.
-
Add a configurable start timestamp: Allow
CDCConsumer.builder().withStartTimestamp(Timestamp)so users can control where reading begins. -
Larger catch-up windows: When processing historical (already-past) windows, use larger window sizes to reduce query count, then switch to normal window size when approaching the present.
-
Skip empty generations faster: If a generation is fully in the past and the table has no data in it, detect this quickly (e.g., with a single bounded query) rather than scanning every window.
-
Default to recent start: Consider defaulting to
now - confidenceWindowSizefor first-time consumers rather than generation start, with an opt-in flag for full history replay.
Environment
- scylla-cdc-java library (all versions)
- Any Scylla cluster with CDC enabled
- Particularly noticeable on clusters that have been running for days/weeks before CDC consumer is first started