Skip to content

Cagg try#9629

Draft
gayyappan wants to merge 23 commits intotimescale:mainfrom
gayyappan:cagg_backfill
Draft

Cagg try#9629
gayyappan wants to merge 23 commits intotimescale:mainfrom
gayyappan:cagg_backfill

Conversation

@gayyappan
Copy link
Copy Markdown
Member

No description provided.

@gayyappan gayyappan force-pushed the cagg_backfill branch 3 times, most recently from 3705055 to 85adf5a Compare April 21, 2026 22:44
@gayyappan gayyappan force-pushed the cagg_backfill branch 3 times, most recently from 9263fd4 to 09047a5 Compare April 27, 2026 13:16
gayyappan and others added 22 commits April 29, 2026 11:19
Add a new catalog table to track which devices have backfilled data
into old chunks. This will be used by the cagg refresh to only
re-materialize data for devices that actually backfilled, rather
than refreshing the entire time range.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a nullable tenant_column_name column to the continuous_agg
catalog table. When set, it identifies the column used for
backfill-aware refresh. NULL means tracking is disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add support for setting the tenant column on a continuous aggregate
via ALTER MATERIALIZED VIEW ... SET (timescaledb.tenant_column).

Validates that the column exists on the raw hypertable, that all
sibling caggs agree on the same tenant column, and errors if a
tenant column is already set on the cagg.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store the chunk's time range end on ChunkInsertState during
chunk routing. we will use this to determine watermark to
check if this is late arriving data for tenant.

1) Instrument the INSERT and COPY paths to detect when data is inserted
into old chunks (below the low watermark). When backfill is detected,
the device value and time range are buffered in a transaction-local
hash table and flushed to the backfill_tracker catalog table at commit.

The watermark is derived automatically: now - max(chunk_interval, 1 day).
Per-row cost for non-backfill inserts is a single bool check (cached
per chunk). Backfill rows pay slot_getattr + hash lookup + min/max
comparisons. Text conversion only happens at flush time.

2) Cache GetCurrentTimestamp once per transaction for backfill watermark

Avoid calling GetCurrentTimestamp on every new chunk seen during
backfill detection. The timestamp is now computed once in
backfill_tracker_init and reused for all watermark checks within
the transaction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
continuous_agg_backfill_check now returns bool — true means the row's
invalidation has been recorded in the per-device backfill tracker and the
coarse hypertable_invalidation_log entry is not needed.
The INSERT and COPY callers are re-ordered to invoke the tracker check first and only
fall through to continuous_agg_dml_invalidate when it returns false
(recent chunks,d or hypertables without a configured tenant column).

This ensures backfill-chunk inserts with a tenant column produce exactly
one invalidation record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
we hade = earlier. we need to compare against a list of tennats.
convert equality with <single value> to euqlaity with ANY (arry of values)
group backfill invalidations by bucket + tenant.
Manual refresh now consumes the continuous_aggs_backfill_tracker in
addition to the coarse cagg invalidation log. Entries in the window
are bucket-grouped and re-materialised per-tenant via DELETE+INSERT
scoped by tenant = ANY($3), bypassing a full time-range rewrite of
buckets that only one tenant backfilled into.

Adds two pieces plumbed into the existing T3 of
continuous_agg_refresh_internal:

- collect_and_delete_tracker_entries_in_window (invalidation.c)
  expands each tracker row [lowest, greatest] under the cagg's
  bucket function into per-bucket (tenant) pairs, sorts + dedups,
  and deletes the source rows.
- continuous_agg_refresh_with_tracker (refresh.c) walks the resulting
  groups and calls continuous_agg_update_materialization_for_tenant
  once per bucket with a constructed tenant ArrayType.

process_cagg_invalidations_and_refresh now returns true when either
the cagg log or the tracker produced work, so the "already up-to-date"
notice is correctly suppressed after a tracker-only refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover backfill that spans multiple buckets for one tenant and
backfill that touches multiple tenants across buckets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r region

NOT HANDLED well today.
Need to fix.
we need this for the case where there are multiple caggs defined on
the hypertable. Fix this later and add only if there are multiple caggs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant