Skip to content

feat: support conditional 'dc/base/' prefixing controlled by an ENV#516

Open
SandeepTuniki wants to merge 3 commits into
masterfrom
dynamic-base-prefix
Open

feat: support conditional 'dc/base/' prefixing controlled by an ENV#516
SandeepTuniki wants to merge 3 commits into
masterfrom
dynamic-base-prefix

Conversation

@SandeepTuniki
Copy link
Copy Markdown
Contributor

@SandeepTuniki SandeepTuniki commented Jun 1, 2026

This PR introduces the IS_BASE_DC env to conditionally control dc/base/ prefixing in the ingestion pipelines. When set to false (applicable for DCP), it dynamically strips this prefix during import.

…y environment variable

Introduce the 'IS_BASE_DC' environment variable to conditionally control 'dc/base/' prefixing in the ingestion pipelines. When set to false, it dynamically strips this prefix from the database entities, relationships, and metadata graphs during import, ensuring clean prefix-free ingestion for Custom Data Commons (DCP).

Centralize prefixing and prefix-stripping logic inside the low-level 'pipeline/data' module inside 'ProvenanceUtils.java' to prevent circular dependencies while retaining high performance. Added thorough unit test cases in both GraphReaderTest and CacheReaderTest to assert Custom DC prefix-free execution.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for non-base Data Commons ingestion runs by adding an isBaseDc flag across the pipeline, allowing the omission of the dc/base/ prefix in provenance DCIDs. Feedback on the changes highlights that the isBaseDc environment variable check unconditionally overwrites command-line arguments even when the variable is absent, and that the new isBaseDc field in the Observation class needs to be included in its equals and hashCode implementations.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Jun 1, 2026

Not up to standards ⛔

🔴 Issues 1 medium

Alerts:
⚠ 1 issue (≤ 0 issues of at least minor severity)

Results:
1 new issue

Category Results
Complexity 1 medium

View in Codacy

🟢 Metrics 10 complexity · 12 duplication

Metric Results
Complexity 10
Duplication 12

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@SandeepTuniki SandeepTuniki changed the title feat(pipeline): support conditional 'dc/base/' prefixing controlled by environment variable feat: support conditional 'dc/base/' prefixing controlled by environment variable Jun 1, 2026
@SandeepTuniki SandeepTuniki changed the title feat: support conditional 'dc/base/' prefixing controlled by environment variable feat: support conditional 'dc/base/' prefixing controlled by an ENV Jun 1, 2026
@SandeepTuniki SandeepTuniki marked this pull request as ready for review June 1, 2026 06:54
@SandeepTuniki
Copy link
Copy Markdown
Contributor Author

The failing codacy check is about a function being long (>50 lines). I'm leaving it as-is for now, because the check is failing on a test file, and that test function's length is mostly due to constructing test fixture. I'm not sure if this specific codacy check is appropriate for a test function.

@SandeepTuniki SandeepTuniki requested review from gmechali and vish-cs June 1, 2026 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant