feat: add multi-catalog and multi-schema support for Stitch integration#45
Merged
punit-naik-amp merged 1 commit intomainfrom Nov 27, 2025
Merged
Conversation
This commit implements the ability to scan and configure Stitch across
multiple Databricks catalogs and schemas in a single operation, while
maintaining full backward compatibility with single-location usage.
Key changes:
Core functionality (stitch_tools.py):
- Add validate_multi_location_access() to pre-validate access permissions
- Add _helper_prepare_multi_location_stitch_config() for aggregating
PII scans across multiple locations
- Refactor _helper_prepare_stitch_config() to support both single and
multi-location modes with unified interface
- Gracefully handle partial failures when some locations are inaccessible
Command interface (setup_stitch.py):
- Add support for 'targets' parameter accepting list of catalog.schema pairs
- Add 'output_catalog' parameter to specify where outputs should be stored
- Enhance _display_config_preview() to show scan results for multiple locations
- Update command definition with new parameters and usage examples
- Fix pylint issues: remove unnecessary elif after return statements
Type safety (url_utils.py):
- Update normalize_workspace_url() to accept Optional[str]
- Update detect_cloud_provider() to accept Optional[str]
- Ensures proper handling of None/empty workspace URLs
Testing (test_stitch_tools.py):
- Add 10 new tests in TestMultiCatalogSupport class covering:
* Access validation for all/partial/missing locations
* Successful multi-location configuration
* Partial failure handling with graceful degradation
* No PII found across locations
* Backward compatibility verification
- Fix 12 existing tests to add required schema validation setup
- All 27 tests passing
Usage examples:
Single location (backward compatible):
/setup-stitch --catalog_name prod --schema_name crm
Multiple locations:
/setup-stitch --targets prod.crm,prod.ecommerce,analytics.customers --output_catalog prod
39fd697 to
0efacac
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit implements the ability to scan and configure Stitch across multiple Databricks catalogs and schemas in a single operation, while maintaining full backward compatibility with single-location usage.
Key changes:
Core functionality (stitch_tools.py):
Command interface (setup_stitch.py):
Type safety (url_utils.py):
Testing (test_stitch_tools.py):
Usage examples:
Single location (backward compatible): /setup-stitch --catalog_name prod --schema_name crm
Multiple locations: /setup-stitch --targets prod.crm,prod.ecommerce,analytics.customers --output_catalog prod
Demo:
NOTE: Relevant tables' columns were already tagged with PII info
Databricks notebook screenshots after successful multi-catalog multi-schema stitch operation:





