Skip to content

First-time CDC consumer startup causes excessive cluster load by scanning from generation start #172

@dkropachev

Description

@dkropachev

Problem

When a CDC consumer is started for the first time against a table (empty or with very few records), it takes a very long time to begin delivering changes and generates excessive load on the Scylla cluster. Users report this even for tables with only ~20 rows.

Root Cause Analysis

The root cause is that on first startup (no saved state), the library begins reading from the very first CDC generation start timestamp rather than from "now" or a recent point in time. This forces it to iterate through every 30-second time window from the generation start to the present, issuing a CQL query for each window — even when all results are empty.

Detailed breakdown

1. Initial state always starts from generation beginning

TaskState.createInitialFor() (TaskState.java:92-95) sets the read position to the generation start:

public static TaskState createInitialFor(GenerationId generation, long windowSizeMs) {
    Timestamp generationStart = generation.getGenerationStart();
    return new TaskState(generationStart, generationStart.plus(windowSizeMs, ChronoUnit.MILLIS), Optional.empty());
}

2. First generation fetched from epoch

BaseMasterCQL.fetchFirstGenerationId() (BaseMasterCQL.java:29-31) queries for the earliest generation after Date(0) (Unix epoch):

public CompletableFuture<Optional<GenerationId>> fetchFirstGenerationId() {
    return fetchSmallestGenerationAfter(new Date(0))
            .thenApply(opt -> opt.map(t -> new GenerationId(new Timestamp(t))));
}

If the cluster has existed for a long time, the first generation could be weeks or months old — even if the table the user cares about was created recently.

3. Massive number of empty window queries

With the default 30-second window size (WorkerConfiguration.DEFAULT_QUERY_TIME_WINDOW_SIZE_MS = 30000), the number of windows to scan is:

Generation age Windows per task
1 day 2,880
7 days 20,160
30 days 86,400

Each window requires a CQL query to the CDC log table. This is multiplied by the number of vnodes (streams) in the generation, which can be 128-256+ on a multi-node cluster, times the number of tables being consumed.

Example: A 3-node cluster with a generation 7 days old and 128 vnodes consuming 1 table:

  • 20,160 windows × 128 tasks = ~2.6 million queries just to catch up, all returning empty results.

4. TTL-based trimming is insufficient

The library does attempt to trim based on table TTL (Worker.java:69-76), but:

  • If no TTL is set on the CDC log table, minimumWindowStart falls back to Date(0) — no trimming at all
  • Even with the default 24-hour CDC TTL, that's still 2,880 windows × number-of-tasks
  • The TTL check only trims if the entire current window is before (now - TTL), so partial overlap keeps the old start

5. Multiple generations compound the problem

If the cluster has been through topology changes (node additions/removals), there will be multiple generations. The master loop (GenerationBasedCDCMetadataModel.java:123-140) processes each generation sequentially. For each, it fetches all stream metadata, creates tasks, and waits for workers to fully consume them before moving to the next.

The generationTTLExpired() check (GenerationBasedCDCMetadataModel.java:59-93) can skip old generations, but only if all tables have a TTL set. One table without TTL prevents skipping.

Impact

  • High cluster load: Burst of millions of empty CQL queries hitting the cluster
  • Long startup time: Minutes to hours before the consumer reaches "live" data
  • Wasted resources: All queries return empty results for empty/new tables
  • Poor first-time experience: Users setting up CDC for the first time see unexplained delays

Suggested Improvements

  1. Add a "start from now" option: Allow users to configure the consumer to start from the current timestamp instead of the generation start. This is the most common use case for first-time consumers.

  2. Add a configurable start timestamp: Allow CDCConsumer.builder().withStartTimestamp(Timestamp) so users can control where reading begins.

  3. Larger catch-up windows: When processing historical (already-past) windows, use larger window sizes to reduce query count, then switch to normal window size when approaching the present.

  4. Skip empty generations faster: If a generation is fully in the past and the table has no data in it, detect this quickly (e.g., with a single bounded query) rather than scanning every window.

  5. Default to recent start: Consider defaulting to now - confidenceWindowSize for first-time consumers rather than generation start, with an opt-in flag for full history replay.

Environment

  • scylla-cdc-java library (all versions)
  • Any Scylla cluster with CDC enabled
  • Particularly noticeable on clusters that have been running for days/weeks before CDC consumer is first started

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions