Skip to content

Add filtering for PRs to match issue filtering behavior#418

Open
Bala-Sakabattula wants to merge 2 commits intorelease-engineering:mainfrom
Bala-Sakabattula:pr-issue-creation-filtering
Open

Add filtering for PRs to match issue filtering behavior#418
Bala-Sakabattula wants to merge 2 commits intorelease-engineering:mainfrom
Bala-Sakabattula:pr-issue-creation-filtering

Conversation

@Bala-Sakabattula
Copy link
Collaborator

Summary

This PR adds filtering (and other filter types) to PR handling in upstream_pr.py to ensure consistent filtering behavior between issues and PRs, regardless of which handler path they take.

Problem

Previously, PRs were filtered inconsistently depending on which handler path they used:

  • Issue handler path (u_issue.handle_github_message() with is_pr=True): PRs were filtered by labels, milestone, and other configured filters ✅
  • PR handler path (u_pr.handle_github_message()): PRs were not filtered at all

This inconsistency meant that PRs coming through the PR handler path would bypass filtering and could create JIRA issues even when they didn't match the configured filter criteria (e.g., missing required labels).

Solution

Added the same filtering logic used in upstream_issue.py to upstream_pr.py's handle_github_message() function. The filtering now supports:

  1. Label filtering: PRs must have at least one of the configured labels
  2. Milestone filtering: PRs must match the configured milestone number
  3. Other field filtering: PRs must match any other configured filter fields (e.g., custom fields)

The same filter configuration (config["sync2jira"]["filters"]["github"][upstream]) is used for both issues and PRs, ensuring consistent behavior.

Changes

  • sync2jira/upstream_pr.py: Added filtering logic (lines 52-80) that mirrors the filtering in upstream_issue.py
  • tests/test_upstream_pr.py: Added comprehensive test cases to verify all three filtering conditions work correctly

Testing

Added test cases:

  • test_handle_github_message_filtering: Tests that PRs are correctly filtered out when they don't match labels, milestone, or other field criteria
  • test_handle_github_message_filtering_passes: Tests that PRs pass through when all filters match

@qodo-code-review
Copy link

Review Summary by Qodo

Add PR filtering to match issue filtering behavior

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add filtering logic to PR handler matching issue filtering behavior
• Support label, milestone, and custom field filtering for PRs
• Ensure consistent filtering across both issue and PR handler paths
• Add comprehensive test coverage for all filtering conditions
Diagram
flowchart LR
  A["PR Handler"] --> B["Extract Filters Config"]
  B --> C["Check Labels Filter"]
  C --> D["Check Milestone Filter"]
  D --> E["Check Other Fields Filter"]
  E --> F{All Filters Pass?}
  F -->|No| G["Return None"]
  F -->|Yes| H["Process PR"]
Loading

Grey Divider

File Changes

1. sync2jira/upstream_pr.py ✨ Enhancement +28/-0

Add filtering logic to PR handler

• Added filter extraction from config for the upstream repository
• Implemented label filtering with set intersection check
• Implemented milestone filtering with number comparison
• Implemented generic field filtering with direct value comparison
• All filters return None if conditions not met, preventing PR processing

sync2jira/upstream_pr.py


2. tests/test_upstream_pr.py 🧪 Tests +91/-0

Add comprehensive PR filtering test coverage

• Added test_handle_github_message_filtering to test all three filter types
• Tests label filtering rejection when PR has wrong labels
• Tests milestone filtering rejection when PR has wrong milestone number
• Tests custom field filtering rejection when PR has wrong field value
• Added test_handle_github_message_filtering_passes to verify filters pass correctly
• Verifies PR.from_github is called when all filters match

tests/test_upstream_pr.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 4, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (2) 📎 Requirement gaps (0)

Grey Divider


Action required

1. label["name"] can KeyError 📘 Rule violation ⛯ Reliability
Suggestion Impact:The inline GitHub filter loop (including the vulnerable label["name"] set-comprehension) was removed and replaced with a centralized u_issue._apply_github_filters(...) call, likely to encapsulate safer/validated filter handling instead of direct indexing in this function.

code diff:

-    for key, expected in _filter.items():
-        if key == "labels":
-            # special handling for label: we look for it in the list of PR labels
-            actual = {label["name"] for label in pr.get("labels", [])}
-            if actual.isdisjoint(expected):
-                log.debug("Labels %s not found on PR: %s", expected, upstream)
-                return None
-        elif key == "milestone":
-            # special handling for milestone: use the number
-            milestone = pr.get(key) or {}  # Key might exist with value `None`
-            actual = milestone.get("number")
-            if expected != actual:
-                log.debug("Milestone %s not set on PR: %s", expected, upstream)
-                return None
-        else:
-            # direct comparison
-            actual = pr.get(key)
-            if actual != expected:
-                log.debug(
-                    "Actual %r %r != expected %r on PR %s",
-                    key,
-                    actual,
-                    expected,
-                    upstream,
-                )
-                return None
+    if not u_issue._apply_github_filters(pr, _filter, upstream, item_type="PR"):
+        return None

Description
• The new label-filtering logic assumes every entry in pr.get("labels", []) is a dict containing a
  name key, using label["name"].
• Since the PR payload is an external input, a missing/malformed label element would raise
  KeyError and crash processing instead of failing gracefully (filtering or logging and continuing).
• This is an unhandled edge case and missing input validation at a new failure point introduced by
  the PR.
Code

sync2jira/upstream_pr.py[R55-61]

+    for key, expected in _filter.items():
+        if key == "labels":
+            # special handling for label: we look for it in the list of PR labels
+            actual = {label["name"] for label in pr.get("labels", [])}
+            if actual.isdisjoint(expected):
+                log.debug("Labels %s not found on PR: %s", expected, upstream)
+                return None
Evidence
Compliance requires handling edge cases and validating external inputs. The added set-comprehension
indexes label["name"] directly, which will raise on unexpected/malformed GitHub payloads, and
there is no try/guard around it.

Rule 3: Generic: Robust Error Handling and Edge Case Management
Rule 6: Generic: Security-First Input Validation and Data Handling
sync2jira/upstream_pr.py[55-61]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The label filtering uses `label["name"]` on external GitHub payload data, which can raise `KeyError` if any label object is missing `name` or is malformed.

## Issue Context
`body`/`pr` come from GitHub webhook/API payloads (external input). The filter path should not crash the handler on unexpected payload shapes.

## Fix Focus Areas
- sync2jira/upstream_pr.py[55-61]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Logs raw filter mismatch values 📘 Rule violation ⛨ Security
Suggestion Impact:The commit removed the inlined filter-evaluation code that logged raw `actual` and `expected` values (including the highlighted debug log) and replaced it with a call to `_apply_github_filters`, eliminating the sensitive log output from this function.

code diff:

-    for key, expected in _filter.items():
-        if key == "labels":
-            # special handling for label: we look for it in the list of PR labels
-            actual = {label["name"] for label in pr.get("labels", [])}
-            if actual.isdisjoint(expected):
-                log.debug("Labels %s not found on PR: %s", expected, upstream)
-                return None
-        elif key == "milestone":
-            # special handling for milestone: use the number
-            milestone = pr.get(key) or {}  # Key might exist with value `None`
-            actual = milestone.get("number")
-            if expected != actual:
-                log.debug("Milestone %s not set on PR: %s", expected, upstream)
-                return None
-        else:
-            # direct comparison
-            actual = pr.get(key)
-            if actual != expected:
-                log.debug(
-                    "Actual %r %r != expected %r on PR %s",
-                    key,
-                    actual,
-                    expected,
-                    upstream,
-                )
-                return None
+    if not u_issue._apply_github_filters(pr, _filter, upstream, item_type="PR"):
+        return None

Description
• The new debug logging prints actual and expected values for arbitrary filter keys, which may
  include sensitive content from the PR payload depending on configured filters.
• This increases the chance of leaking sensitive data into logs and does not produce structured logs
  as required for secure auditing/monitoring.
Code

sync2jira/upstream_pr.py[R73-79]

+                log.debug(
+                    "Actual %r %r != expected %r on PR %s",
+                    key,
+                    actual,
+                    expected,
+                    upstream,
+                )
Evidence
The secure logging rule requires avoiding sensitive data in logs and favors structured logging. The
new log line emits raw actual values taken from the PR payload and expected values from config,
without redaction/structuring safeguards.

Rule 5: Generic: Secure Logging Practices
sync2jira/upstream_pr.py[73-79]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Debug logs in the new filtering code may leak sensitive PR payload values by logging `actual` and `expected` directly.

## Issue Context
Filter keys are configurable and may reference fields that can contain sensitive content. Logging should be structured and avoid raw payload values unless explicitly redacted.

## Fix Focus Areas
- sync2jira/upstream_pr.py[60-79]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

_filter = config["sync2jira"].get("filters", {}).get("github", {}).get(upstream, {})

pr = body["pull_request"]
for key, expected in _filter.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent is this whole section a copy/paste of the same code in upstream_issue.py?

Can it be refactored into a shared function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be ..

Copy link
Collaborator

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bala-Sakabattula, congratulations the perfect testing coverage for this change! 🏆

However, rather than adding this code here, I think you should refactor the body of upstream_issue.handle_github_message() into a subroutine which we can call from here.

Comment on lines 40 to 80
expected,
upstream,
)
return None
Copy link
Collaborator

@webbnh webbnh Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code in upstream_pr.py is now basically identical to the analogous code in upstream_issue.py. There are two significant differences, but the code in upstream_issue.py already makes provisions to handle one of them, and the other is that the Issue information is found in body["issue"] while the PR information is found in body["pull_request"], but that shouldn't be hard to code around (e.g., see how upstream_issue.py handles the analogous problem of accessing the "sync" field in the configuration; or, alternatively, the respective expressions could be specified as arguments to a common subroutine).

It's probably not worth trying to make a single handle_github_message() function which handles both PRs and Issues (because the outputs are different and the inputs are different currently), but the bulk of the body (e.g., lines 40-80) should be able to be refactored into a common subroutine used by both functions.

Comment on lines 81 to 82
github_client = Github(config["sync2jira"]["github_token"])
reformat_github_pr(pr, upstream, github_client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like upstream_issue.py has some additional logic around creating the GitHub client which should either be reproduced here or refactored into a helper routine called from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants