feat: support conditional 'dc/base/' prefixing controlled by an ENV#516
feat: support conditional 'dc/base/' prefixing controlled by an ENV#516SandeepTuniki wants to merge 3 commits into
Conversation
…y environment variable Introduce the 'IS_BASE_DC' environment variable to conditionally control 'dc/base/' prefixing in the ingestion pipelines. When set to false, it dynamically strips this prefix from the database entities, relationships, and metadata graphs during import, ensuring clean prefix-free ingestion for Custom Data Commons (DCP). Centralize prefixing and prefix-stripping logic inside the low-level 'pipeline/data' module inside 'ProvenanceUtils.java' to prevent circular dependencies while retaining high performance. Added thorough unit test cases in both GraphReaderTest and CacheReaderTest to assert Custom DC prefix-free execution.
There was a problem hiding this comment.
Code Review
This pull request introduces support for non-base Data Commons ingestion runs by adding an isBaseDc flag across the pipeline, allowing the omission of the dc/base/ prefix in provenance DCIDs. Feedback on the changes highlights that the isBaseDc environment variable check unconditionally overwrites command-line arguments even when the variable is absent, and that the new isBaseDc field in the Observation class needs to be included in its equals and hashCode implementations.
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Complexity | 1 medium |
🟢 Metrics 10 complexity · 12 duplication
Metric Results Complexity 10 Duplication 12
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
|
The failing codacy check is about a function being long (>50 lines). I'm leaving it as-is for now, because the check is failing on a test file, and that test function's length is mostly due to constructing test fixture. I'm not sure if this specific codacy check is appropriate for a test function. |
This PR introduces the
IS_BASE_DCenv to conditionally controldc/base/prefixing in the ingestion pipelines. When set to false (applicable for DCP), it dynamically strips this prefix during import.