fix(source-wordpress): add lookback window to incremental streams and fix tab character in pages stream#76063
Draft
devin-ai-integration[bot] wants to merge 3 commits intomasterfrom
Draft
Conversation
… fix tab character in pages stream - Add lookback_window: PT1H to editor_blocks, comments, pages, and media streams to prevent data loss during DST fall-back transitions - Fix tab character bug in pages stream field_name (modified_after\t -> modified_after) - Bump version from 0.0.48 to 0.0.49 Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Contributor
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
|
Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
|
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit aec4d9e. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Resolves https://github.com/airbytehq/oncall/issues/11859:
The WordPress connector's incremental streams (
editor_blocks,pages,media,comments) use local-timezone cursor fields (modified/date), which are non-monotonic during DST fall-back transitions — risking data loss when the clock repeats an hour.Additionally, the
pagesstream had an embedded tab character in itsfield_name:"modified_after\t". This likely caused the WordPress API to not recognize themodified_afterparameter, meaning server-side filtering was silently broken for pages.How
lookback_window: PT1Hto theDatetimeBasedCursoron all four incremental streams (editor_blocks,comments,pages,media). This re-fetches 1 hour of data before the cursor on each sync, covering the maximum 1-hour DST shift. Duplicates are handled by destination deduplication.pagesstream'sstart_time_option.field_name("modified_after\t"→modified_after).0.0.48→0.0.49.Declarative-First Evaluation
Used the built-in
lookback_windowproperty ofDatetimeBasedCursor— a one-line manifest addition per stream. No custom Python components needed. Prior art:source-convertkit,source-mantle, and others uselookback_window: PT1Hfor the same pattern.Test Coverage
Created
unit_tests/test_manifest.pywith 6 tests:lookback_window: PT1Hon all 4 incremental streamsfield_nametab characterfield_namecontains a tabThis connector has no integration test secrets (all acceptance tests are bypassed), so manifest-level validation is the appropriate testing approach. No live API testing was performed.
Review guide
airbyte-integrations/connectors/source-wordpress/manifest.yaml— the core fix. Four one-linelookback_window: PT1Hadditions and the tab removal on pages.airbyte-integrations/connectors/source-wordpress/unit_tests/test_manifest.py— new testsairbyte-integrations/connectors/source-wordpress/metadata.yaml— version bumpdocs/integrations/sources/wordpress.md— changelog entryKey things to verify:
field_name: modified_afterwithout quotes, line ~289 in diff)lookback_window: PT1His present on all 4 incremental streams, not just a subsetfield_namemeans server-side filtering will now actually work — this is a positive behavior change but may alter data volumes for pages syncs (previously the API may have returned unfiltered results)User Impact
editor_blocks,pages,media, andcommentswill now re-fetch 1 extra hour of data per sync, preventing data loss during DST transitions. This may produce some duplicate records which are handled by destination dedup.pagesstream will now correctly send themodified_afterparameter to the WordPress API, enabling proper server-side filtering that was previously broken by the tab character.Can this PR be safely reverted and rolled back?
Link to Devin session: https://app.devin.ai/sessions/f007d5e6b4024e15aca2f6e791014ca1