Skip to content

fix(source-s3): increase memory for Check Connection to prevent OOM on large buckets#75968

Open
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin/1775035929-source-s3-oom-check-connection
Open

fix(source-s3): increase memory for Check Connection to prevent OOM on large buckets#75968
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin/1775035929-source-s3-oom-check-connection

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Apr 1, 2026

What

Resolves https://github.com/airbytehq/oncall/issues/11839:

The source-s3 connector (v4.15.2) hits OOM (exit code 137) during Check Connection for customers with large S3 buckets. The root cause is that metadata.yaml lacks a resourceRequirements section, so Check Connection runs with the platform default of 800Mi memory — insufficient for buckets with many large files (e.g. 16 Parquet files totaling ~10.8GB).

How

Adds a resourceRequirements.jobSpecific entry for check_connection with memory_limit: 4096Mi and memory_request: 4096Mi. This matches the established pattern used by sibling file-based connectors:

  • source-azure-blob-storage4096Mi for check_connection
  • source-sftp-bulk4096Mi for check_connection

No code changes — metadata-only fix. Version bumped from 4.15.24.15.3 (patch).

Review guide

  1. airbyte-integrations/connectors/source-s3/metadata.yaml — the core fix (resourceRequirements block) and version bump
  2. airbyte-integrations/connectors/source-s3/pyproject.toml — version bump
  3. docs/integrations/sources/s3.md — changelog entry

Human review checklist

  • Verify the resourceRequirements YAML block is correctly indented under data: and placed before releases:
  • Confirm 4096Mi is the right value (matches source-azure-blob-storage and source-sftp-bulk)
  • Verify version 4.15.3 is consistent across metadata.yaml and pyproject.toml
  • Confirm 4096Mi is sufficient for the reported case (16 Parquet files, ~10.8GB total) — platform allocates this ceiling only for check_connection, not sync

Test Coverage

No unit tests added. This is a metadata-only change — no connector code was modified. The resourceRequirements field is consumed by the platform scheduler, not by the connector runtime, so there is no connector-level code path to test.

User Impact

Customers with large S3 buckets will no longer experience OOM failures during Check Connection. No negative side effects — this only increases the memory ceiling for the check job type.

Can this PR be safely reverted and rolled back?

  • YES 💚

Link to Devin session: https://app.devin.ai/sessions/10726618e5e94f188f58ebb3d2dba5ad

…n large buckets

Add resourceRequirements.jobSpecific.check_connection with 4096Mi memory
to metadata.yaml, matching the pattern used by source-azure-blob-storage
and source-sftp-bulk (both cdk:python-file-based connectors).

This resolves OOM exit code 137 during Check Connection for customers
with large S3 buckets (e.g. 16 Parquet files, ~10.8GB total).

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
      • Bump types: patch (default), minor, major, major_rc, rc, promote.
      • The rc type is a smart default: applies minor_rc if stable, or bumps the RC number if already RC.
      • The promote type strips the RC suffix to finalize a release.
      • Example: /bump-version type=rc or /bump-version type=minor
    • /bump-progressive-rollout-version - Alias for /bump-version type=rc. Bumps with an RC suffix and enables progressive rollout.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@@ -356,6 +356,7 @@ This connector utilizes the open source [Unstructured](https://unstructured-io.g

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change

Co-Authored-By: bot_apk <apk@cognition.ai>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Deploy preview for airbyte-docs ready!

✅ Preview
https://airbyte-docs-32bo3oygc-airbyte-growth.vercel.app

Built with commit 696d431.
This pull request is being automatically deployed with vercel-action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

source-s3 Connector Test Results

208 tests   185 ✅  8m 30s ⏱️
  3 suites   23 💤
  3 files      0 ❌

Results for commit 696d431.

♻️ This comment has been updated with latest results.

@vai-airbyte
Copy link
Copy Markdown
Contributor

Vai Ignatavicius (vai-airbyte) commented Apr 1, 2026

/publish-connectors-prerelease

Pre-release Connector Publish Started

Publishing pre-release build for connector source-s3.
PR: #75968

Pre-release versions will be tagged as {version}-preview.696d431
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish: SUCCESS

Docker image (pre-release):
airbyte/source-s3:4.15.3-preview.696d431

Docker Hub: https://hub.docker.com/layers/airbyte/source-s3/4.15.3-preview.696d431

Registry JSON:

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

↪️ Triggering /ai-review per Hands-Free AI Triage Project triage next step.

Reason: CI all green, prerelease published — advancing to review stage.
Linked issue: https://github.com/airbytehq/oncall/issues/11839

Session: https://app.devin.ai/sessions/944109c31676472da2e61c4602224cad

@octavia-bot octavia-bot bot marked this pull request as ready for review April 3, 2026 11:42
@octavia-bot
Copy link
Copy Markdown
Contributor

octavia-bot bot commented Apr 3, 2026

AI PR Review starting...

Reviewing PR for connector safety and quality.
View playbook

Devin AI session created successfully!

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

devin-ai-integration bot commented Apr 3, 2026

AI PR Review Report

Review Action: NO ACTION (NOT ELIGIBLE)

Gate Status
PR Hygiene PASS
Code Hygiene PASS
Test Coverage PASS
Code Security PASS
Per-Record Performance PASS
Breaking Dependencies PASS
Backwards Compatibility PASS
Forwards Compatibility PASS
Behavioral Changes FAIL
Out-of-Scope Changes PASS
CI Checks PASS
Live / E2E Tests PASS

Note: Behavioral Changes gate is an Anti-Pattern gate that blocks auto-approval and requires human sign-off. It does not trigger automated REQUEST CHANGES. The memory_limit and memory_request keywords in the diff triggered this gate, which is expected for a resource allocation change. A human reviewer should confirm the 4096Mi value is appropriate.


📋 PR Details & Eligibility

Connector & PR Info

Connector(s): source-s3
PR: #75968
HEAD SHA: 696d431f9fd840491767a0f4a5f03fe4b976636a
Session: https://app.devin.ai/sessions/141d954937334bba840d17870e84dd86

Auto-Approve Eligibility

Eligible: No
Category: not-eligible
Reason: PR modifies metadata.yaml (adding resourceRequirements) and pyproject.toml (version bump), which are functional metadata changes. These do not fall into any auto-approve category (docs-only, additive spec, patch/minor deps, or comment/whitespace-only).

Review Action Details

NO ACTION (NOT ELIGIBLE) — All enforced gates pass, but the Behavioral Changes anti-pattern gate is flagged (memory_limit, memory_request keywords detected in diff). This requires human sign-off. Additionally, the PR is not eligible for auto-approval since it contains functional metadata changes.

Note: This bot can approve PRs when all gates pass AND the PR is eligible for auto-approval (docs-only, additive spec changes, patch/minor dependency bumps, or comment/whitespace-only changes). PRs with other types of changes require human review even if all gates pass.

🔍 Gate Evaluation Details

Gate-by-Gate Analysis

Gate Status Enforced? Details
PR Hygiene PASS Yes Description present (>50 chars), changelog updated in docs/integrations/sources/s3.md, no unresolved human comments
Code Hygiene PASS WARNING No source code files (.py, manifest.yaml) modified — metadata-only change is exempt from coverage check
Test Coverage PASS Yes Exempt: metadata-only change (no functional source code modified). Title contains "fix" but change is metadata.yaml + version bump only
Code Security PASS Yes No security-sensitive file paths matched. No security keywords (authenticator, OAuthAuthenticator, api_token, etc.) in diff hunks
Per-Record Performance PASS WARNING No changes to record processing code paths
Breaking Dependencies PASS WARNING No dependency version changes — only connector version bump (4.15.2 → 4.15.3)
Backwards Compatibility PASS Blocks Auto-Approve metadata.yaml change is additive (new resourceRequirements block). No spec changes, no stream removals, no schema changes. Patch version bump.
Forwards Compatibility PASS Blocks Auto-Approve No state/cursor/pagination/transformation keywords in diff hunks
Behavioral Changes FAIL Blocks Auto-Approve Keywords memory_limit and memory_request found in diff hunks (resource keywords). This is expected for a resource allocation change — human should verify 4096Mi is appropriate.
Out-of-Scope Changes PASS Skip All changed files are in-scope: connector metadata, connector pyproject.toml, connector docs
CI Checks PASS Yes All core checks green: Connector CI Checks Summary ✅, Test source-s3 Connector (208 tests, 185✅, 23💤, 0❌) ✅, Lint source-s3 Connector ✅, Format Check ✅, Check Changelog Updated
Live / E2E Tests PASS Yes Bug fix indicator matched (title contains "fix"). Passed via author justification: metadata-only change — resourceRequirements is consumed by the platform scheduler, not the connector runtime. No connector code was modified, so there is no connector-level code path to validate. Pre-release published: airbyte/source-s3:4.15.3-preview.696d431

Detailed Analysis

PR Hygiene:

  • PR Body Length (raw): ~1800+ chars
  • PR Body Length (after stripping): ~1800+ chars
  • PR Body Length (visible content): ~1500+ chars
  • PR Body Preview: "## What\n\nResolves https://github.com/airbytehq/oncall/issues/11839..."
  • Changelog: docs/integrations/sources/s3.md updated with version 4.15.3 entry
  • Peer feedback: All comments from bot accounts ([bot] suffix) — no human reviewer comments

Code Security — Diff Hunk Analysis:

  • metadata.yaml diff hunks contain: dockerImageTag, resourceRequirements, jobSpecific, jobType, check_connection, memory_limit, memory_request
  • None of the security keywords (authenticator, OAuthAuthenticator, BearerAuthenticator, api_token, client_id, client_secret, refresh_token, access_token, api_key, allowedHosts, connectorBuildOptions, dockerRepository) found

Backwards Compatibility — Spec Comparison:

  • No spec files (spec.json, spec.yaml) were modified in this PR
  • No manifest.yaml changes
  • metadata.yaml change is purely additive (new resourceRequirements block)

Behavioral Changes — Diff Hunk Keywords:

  • memory_limit: 4096Mi — MATCHED (resource keyword)
  • memory_request: 4096Mi — MATCHED (resource keyword)
  • No rate limiting, retry/backoff, timeout, or error handling keywords found
  • This change matches the established pattern used by sibling connectors (source-azure-blob-storage, source-sftp-bulk)

Live / E2E Tests — Validation Analysis:

  • Validation required: Yes (title contains "fix")
  • MCP verification: Unavailable (Cloud SQL Proxy not running)
  • Pre-release published: airbyte/source-s3:4.15.3-preview.696d431 (Comment by vai-airbyte confirms SUCCESS)
  • Author justification: "This is a metadata-only change — no connector code was modified. The resourceRequirements field is consumed by the platform scheduler, not by the connector runtime, so there is no connector-level code path to test."
  • Assessment: Justification is sufficient — the change modifies platform scheduling metadata only. The connector binary is identical to v4.15.2. There is no connector-level behavior change to validate with live connections.
📚 Evidence Consulted

Evidence

  • Changed files: 3 files
    • airbyte-integrations/connectors/source-s3/metadata.yaml — resourceRequirements + version bump
    • airbyte-integrations/connectors/source-s3/pyproject.toml — version bump
    • docs/integrations/sources/s3.md — changelog entry
  • CI checks: All core checks passed (Connector CI Checks Summary ✅, Test source-s3 ✅, Lint source-s3 ✅, Format Check ✅, Check Changelog Updated ✅, Enforce PR structure ✅)
  • PR labels: None observed
  • PR description: Present, detailed (What/How/Review guide/Test Coverage/User Impact sections)
  • Existing bot reviews: None for this HEAD SHA
  • Pre-release: airbyte/source-s3:4.15.3-preview.696d431 published successfully
  • MCP verification: Unavailable (Cloud SQL Proxy not running — unable to verify syncs on pre-release)

Devin session

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants