dbt-bigquery adapter: snapshot fix for count mismatch and invalid casting#1781
Open
antoniabadarau wants to merge 2 commits intodbt-labs:mainfrom
Conversation
…array-deletion-records
Open
2 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes BigQuery snapshot failures when using hard_deletes='new_record', specifically addressing UNION ALL column-count mismatches caused by STRUCT flattening and type mismatches caused by untyped NULL placeholders.
Changes:
- Add a BigQuery-specific
snapshot_staging_tableoverride to avoid STRUCT flattening and preserve types for “new” columns indeletion_records. - Add a functional regression test covering STRUCT/ARRAY columns and the “new column + hard delete” scenario.
- Add an unreleased changelog entry documenting the fix.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
dbt-bigquery/src/dbt/include/bigquery/macros/materializations/snapshot.sql |
Introduces bigquery__snapshot_staging_table override using get_columns_in_query() and source_data.<col> to keep UNION ALL compatible on BigQuery. |
dbt-bigquery/tests/functional/adapter/test_snapshot_struct_array.py |
Adds regression coverage for hard deletes with STRUCT/ARRAY columns and for a newly-added STRUCT column during deletes. |
dbt-bigquery/.changes/unreleased/Fixes-20260318-120000.yaml |
Adds a release note for the snapshot fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #1702
Please note that this PR, isolates the fix to the Bigquery adapter instead of modifying the default behaviour. If maintainers consider that this to be feasible as the default behaviour, I am happy to iterate and make these changes.
Problem
Resolves the following issues with snapshots in BigQuery:
Both of these are caused by the default behaviour of snapshot_staging_table macro within the deletion_records CTE:
1. Queries in UNION ALL have mismatched column count
deletion_recordsCTE then loops over the flatten columns and adds them to the select list{% set source_sql_cols = get_column_schema_from_query(source_sql) %} ... select 'insert' as dbt_change_type, {%- for col in source_sql_cols -%} {%- if col.name in snapshotted_cols -%} snapshotted_data.{{ adapter.quote(col.column) }}, {%- else -%} NULL as {{ adapter.quote(col.column) }}, {%- endif -%} {% endfor -%}source_data.*which keep the original structure of the columns without the flatteningdeletion_recordshas more columns than the others2. UNION ALL has incompatible types: ARRAY, ARRAY, ARRAY, INT64
{%- if col.name in snapshotted_cols -%} snapshotted_data.{{ adapter.quote(col.column) }}, {%- else -%} NULL as {{ adapter.quote(col.column) }}, # invalid NULL {%- endif -%}Solution
1. Queries in UNION ALL have mismatched column count
deletion_recordsCTE, make use of get_columns_in_query rather than get_column_schema_from_query{% set source_col_names = get_columns_in_query(source_sql) %} {%- for col_name in source_col_names -%} snapshotted_data.{{ adapter.quote(col_name) }} ... {% endfor -%}2. UNION ALL has incompatible types: ARRAY, ARRAY, ARRAY, INT64
source_data.colfrom the existing LEFT JOIN withdeletes_source_datadeletes_source_datacolumns as NULL but BQ will preserve the schema and infer the correct data type{%- else -%} source_data.{{ adapter.quote(col_name) }}, {%- endif -%}Checklist