Skip to content

dbt-bigquery adapter: snapshot fix for count mismatch and invalid casting#1781

Open
antoniabadarau wants to merge 2 commits intodbt-labs:mainfrom
antoniabadarau:antoniabadarau/fix/bigquery-snapshot-struct-array-deletion-records
Open

dbt-bigquery adapter: snapshot fix for count mismatch and invalid casting#1781
antoniabadarau wants to merge 2 commits intodbt-labs:mainfrom
antoniabadarau:antoniabadarau/fix/bigquery-snapshot-struct-array-deletion-records

Conversation

@antoniabadarau
Copy link
Copy Markdown

@antoniabadarau antoniabadarau commented Mar 19, 2026

Resolves #1702

Please note that this PR, isolates the fix to the Bigquery adapter instead of modifying the default behaviour. If maintainers consider that this to be feasible as the default behaviour, I am happy to iterate and make these changes.

Problem

Resolves the following issues with snapshots in BigQuery:

  1. Queries in UNION ALL have mismatched column count
  2. UNION ALL has incompatible types: ARRAY, ARRAY, ARRAY, INT64

Both of these are caused by the default behaviour of snapshot_staging_table macro within the deletion_records CTE:

1. Queries in UNION ALL have mismatched column count

  • get_column_schema_from_query flattens RECORD columns into their subfields causing the unexpected increase in the column number
    • deletion_records CTE then loops over the flatten columns and adds them to the select list
      {% set source_sql_cols = get_column_schema_from_query(source_sql) %}
      ...
      select
      'insert' as dbt_change_type,
      {%- for col in source_sql_cols -%}
          {%- if col.name in snapshotted_cols -%}
      	    snapshotted_data.{{ adapter.quote(col.column) }},
          {%- else -%}
      	    NULL as {{ adapter.quote(col.column) }},
          {%- endif -%}
      {% endfor -%}
  • looking at the other all the other CTEs (insertions, updates, deletes), these simply use source_data.* which keep the original structure of the columns without the flattening
     insertions as (
      select
          'insert' as dbt_change_type,
          source_data.*
      ...
      ),
      updates as (
            select
                'update' as dbt_change_type,
                source_data.*,
            ...
      ),
      deletes as (
          select
              'delete' as dbt_change_type,
              source_data.*,
      	    ...
      ),
      ...
  • this is then causing the following UNION operation to fail given than the deletion_records has more columns than the others
    select * from insertions
     union all
     select * from updates
     union all
     select * from deletes
     union all
     select * from deletion_records

2. UNION ALL has incompatible types: ARRAY, ARRAY, ARRAY, INT64

  • when a column doesn't yet exist in the snapshot table (new column), the macro emits a bare NULL - in BigQuery, an uncast NULL is inferred as INT64 which leads to the errors we’re seeing
{%- if col.name in snapshotted_cols -%}
      snapshotted_data.{{ adapter.quote(col.column) }},
  {%- else -%}
      NULL as {{ adapter.quote(col.column) }}, # invalid NULL
{%- endif -%}
  • when the UNION ALL combines this with the same column from other CTEs (insertions, updates, deletes) where the column has its actual type, BQ fails because INT64 is incompatible with ARRAY
Queries in UNION ALL have mismatched column count; query 1 has 14 columns, query 4 has 15 columns at [307:5]

Solution

1. Queries in UNION ALL have mismatched column count

{% set source_col_names = get_columns_in_query(source_sql) %}
	{%- for col_name in source_col_names -%}
	  snapshotted_data.{{ adapter.quote(col_name) }}
	  ...
{% endfor -%}

2. UNION ALL has incompatible types: ARRAY, ARRAY, ARRAY, INT64

  • instead of emitting bare NULL reference source_data.col from the existing LEFT JOIN with deletes_source_data
  • given that the join filters on dbt_unique_key being NULL, all deletes_source_data columns as NULL but BQ will preserve the schema and infer the correct data type
{%- else -%}
  source_data.{{ adapter.quote(col_name) }},
{%- endif -%}

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@cla-bot cla-bot bot added the cla:yes The PR author has signed the CLA label Mar 19, 2026
@antoniabadarau antoniabadarau marked this pull request as ready for review March 19, 2026 14:14
@antoniabadarau antoniabadarau requested a review from a team as a code owner March 19, 2026 14:14
Copilot AI review requested due to automatic review settings March 19, 2026 14:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes BigQuery snapshot failures when using hard_deletes='new_record', specifically addressing UNION ALL column-count mismatches caused by STRUCT flattening and type mismatches caused by untyped NULL placeholders.

Changes:

  • Add a BigQuery-specific snapshot_staging_table override to avoid STRUCT flattening and preserve types for “new” columns in deletion_records.
  • Add a functional regression test covering STRUCT/ARRAY columns and the “new column + hard delete” scenario.
  • Add an unreleased changelog entry documenting the fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
dbt-bigquery/src/dbt/include/bigquery/macros/materializations/snapshot.sql Introduces bigquery__snapshot_staging_table override using get_columns_in_query() and source_data.<col> to keep UNION ALL compatible on BigQuery.
dbt-bigquery/tests/functional/adapter/test_snapshot_struct_array.py Adds regression coverage for hard deletes with STRUCT/ARRAY columns and for a newly-added STRUCT column during deletes.
dbt-bigquery/.changes/unreleased/Fixes-20260318-120000.yaml Adds a release note for the snapshot fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla:yes The PR author has signed the CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Snapshot with hard_deletes:new_record erroring with new column (incompatible types)

2 participants