Skip to content

Componentize dagster_gcp resources#33354

Open
michalcabir-ui wants to merge 4 commits intodagster-io:masterfrom
michalcabir-ui:componentizeGcpResources
Open

Componentize dagster_gcp resources#33354
michalcabir-ui wants to merge 4 commits intodagster-io:masterfrom
michalcabir-ui:componentizeGcpResources

Conversation

@michalcabir-ui
Copy link
Contributor

@michalcabir-ui michalcabir-ui commented Jan 23, 2026

Summary & Motivation

This PR implements the foundational infrastructure for dagster-gcp Components, enabling YAML-based configuration for Google Cloud resources in Dagster projects.

Used explicit field duplication rather than dynamic field copying. This ensures strict type safety, clear documentation, and avoids runtime surprises(base on what i did on the AWS pr).

Key Design Decisions:

Explicit Pydantic Models: All components (BigQueryResourceComponent, GCSResourceComponent, etc.) explicitly define their fields using pydantic.Field.

GCS: GCSResourceComponent, GCSFileManagerResourceComponent
IO managers: Left out of this PR per review; may be added later.

Dataproc: DataprocResourceComponent (Marked as Beta, strictly enforces required fields like project_id, region, and cluster_name to match the underlying resource).

How I Tested These Changes

I verified the implementation using a comprehensive test suite in dagster_gcp_tests/component_tests/test_gcp_components.py:

Sandbox Integration: Validated full YAML-to-Resource lifecycles using the create_defs_folder_sandbox pattern for all implemented components.

Field Synchronization: Implemented automated tests to ensure Component fields remain a superset of the underlying Resource fields. This ensures that if a Resource adds a field in the future, the test will fail to remind us to update the Component.

Complex Configuration: Verified that DataprocResourceComponent correctly handles complex nested configurations (dictionaries) and required fields.

Changelog

Added foundational Component infrastructure and registry entry points.
Implemented BigQueryResourceComponent, GCSResourceComponent, and GCSFileManagerResourceComponent.
Implemented DataprocResourceComponent (Beta) with support for cluster config dictionaries.
docs: Added a comprehensive MD guide for GCP Components in docs/docs/integrations/libraries/gcp/component.md, updated docs\sphinx\sections\integrations\libraries\gcp\dagster-gcp.rst.

@michalcabir-ui michalcabir-ui marked this pull request as ready for review January 23, 2026 15:09
@michalcabir-ui michalcabir-ui requested a review from a team as a code owner January 23, 2026 15:09
@xionon xionon requested a review from OwenKephart January 28, 2026 19:45
Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite close!

Just had a few smaller comments, and a couple small updates to make to the tests

You can just get rid of the IOManager component, I think it's unclear if/how we'll want IOManagers represented in the components system so for now we'll ignore them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants