Skip to content

update FilesExt retry logic#1211

Merged
parthban-db merged 4 commits intodatabricks:mainfrom
yuanjieding-db:fix-indefinite-retry
Feb 11, 2026
Merged

update FilesExt retry logic#1211
parthban-db merged 4 commits intodatabricks:mainfrom
yuanjieding-db:fix-indefinite-retry

Conversation

@yuanjieding-db
Copy link
Collaborator

What changes are proposed in this pull request?

WHAT

  • Extending retry function with a new parameter max_attempt to allow client to retry and fail after certain amount
  • Remove 500 from FilesExt retry status code
  • Add new config to set the retry attempts for FilesExt
  • Update the retry logic of FilesExt to fail after certain attempts.

WHY

  • 500 errors shouldn't be retried
  • The FilesExt should always prioritize fallback over retry to avoid regression

How is this tested?

Unit tests were updated to reflect the change.


# Determine which limit was hit
if max_attempts is not None and attempt > max_attempts:
raise TimeoutError(f"Exceeded max retry attempts ({max_attempts})") from last_err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a better error to represent this error? TimeoutError feels a bit odd for this case, as the function is not actually timed out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should use a custom type RetryError, with TimeoutError and MaxRetryExceededError as it's derived types, so that the user can catch the RetryError if they don't care why the retry exhausted, while keeping the information.
However since we have been using built-in TimeoutError and users may already be depending on this behavior, it is risky to change it to a different Error.
If we were to introduce a new type of error for max retry exceeded scenario, it would be more difficult for the upper layer to handle the retry error: it needs to catch both Errors manually.

I don't see a better solution here, unless we can rewrite the retry logic completely, or make FilesExt using a different retry library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you finally changed this to RuntimeError, right? I think it is fine to throw that, as I don't want any other function to depend on that error.

@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1211
  • Commit SHA: 04305fbf1709fd13f5652c8249683b15c5d945c8

Checks will be approved automatically on success.

@parthban-db parthban-db added this pull request to the merge queue Feb 11, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 11, 2026
@parthban-db parthban-db added this pull request to the merge queue Feb 11, 2026
Merged via the queue into databricks:main with commit 50a5b40 Feb 11, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants