Skip to content

As a roundup user, I want builds to retry external service calls so that transient network failures do not cause unnecessary build failures #164

@jordanpadams

Description

@jordanpadams

Checked for duplicates

Yes - I've already checked

🗒 User Story

As a roundup user, I want builds to automatically retry calls to external network services with exponential backoff so that transient network errors do not cause permanent build failures.

💪 Motivation

A stable release of doi-service failed during artifact publication due to a 400 Bad Request from PyPI during a twine upload. The error was a transient network issue, but the build had no retry logic and failed permanently — requiring a manual re-run.

The following external network calls in roundup have no retry logic today:

  • PyPItwine upload (_python.py _ArtifactPublicationStep)
  • GitHub releasespython-release, maven-release, nodejs-release (_GitHubReleaseStep in each context)
  • Maven Centralmvn deploy (_maven.py _ArtifactPublicationStep)
  • npmjs.comnpm publish (_nodejs.py _ArtifactPublicationStep)
  • GitHub APIgithub_changelog_generator, requirement-report, upload_asset (step.py)
  • git push/pull/fetch — across util.py, _maven.py, and all context-specific steps
  • gh-pagesdeploy.sh (DocPublicationStep)

📋 Acceptance Criteria

  • A utility function invoke_with_retry(argv, retries=3, delay=30) is added to util.py that wraps invoke() with exponential backoff (e.g. 30s, 60s, 120s between attempts)
  • invoke_with_retry is used for all external network service calls listed above
  • Retry attempts are logged at WARNING level with attempt number and wait time
  • Final exhaustion is logged at CRITICAL level before re-raising the exception
  • Existing behavior (error propagation for stable builds, suppression for unstable builds) is preserved

🩺 Additional context

The invoke_with_retry function should be placed in util.py and imported where needed. The retry count and initial delay should have sensible defaults but ideally be configurable. The github3 upload_asset() call in DocPublicationStep is a Python API call (not a subprocess) and will need its own inline retry loop.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

ToDo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions