Skip to content

feat(auth): OAuth login + Bearer transport + logout/refresh/status (PR 2 of OAuth stack)#192

Merged
jrusso1020 merged 12 commits into
mainfrom
feat/oauth-cli-surface
Jun 26, 2026
Merged

feat(auth): OAuth login + Bearer transport + logout/refresh/status (PR 2 of OAuth stack)#192
jrusso1020 merged 12 commits into
mainfrom
feat/oauth-cli-surface

Conversation

@jrusso1020

@jrusso1020 jrusso1020 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Description

PR 2 of the OAuth stack — stacked on top of PR 1 (#191). Targets
feat/oauth-driver as base; GitHub auto-rebases this PR onto main
when PR 1 merges. The diff shown here is PR 2 only.

PR 1 added the internal internal/auth/oauth/ driver. PR 2 makes it
user-visible.

Auth-precedence model (per James's call on #heygen-cli)

heygen-cli is agent-first, not user-first like Claude Code, so API
key is the dominant use case. The defaults reflect that:

  • TUI picker: API key in position 1, marked Recommended. OAuth
    is a one-arrow-down opt-in for users who want subscription
    pricing.
  • File-level resolver precedence: HEYGEN_API_KEY env > file
    api_key > file OAuth (was env > OAuth > api_key). Existing
    users with only one credential are unaffected; users who somehow
    have both will resolve to api_key (matching the picker
    recommendation).
  • Single-credential normalization: on successful auth login,
    the runner clears the OTHER credential block. The file now holds
    at most one of api_key / oauth, never both. Pre-this-change
    users with both blocks self-heal on their next login.

Interactive login picker

When stdin and stdout are both TTYs, plain heygen auth login
presents a TUI picker:

Welcome to HeyGen!

How would you like to log in?

> Use an API key
  Paste an existing key. Uses API credits. (Recommended)

  Login with HeyGen.com
  Opens a browser. Uses subscription credits.

↑/↓ to move · enter to select · esc to cancel

Built with Bubble Tea (already in go.mod from the poll spinner).
Arrow keys, j/k vim motions, 1/2 numeric shortcuts, Enter to
select, Esc / Ctrl-C / q to cancel.

Non-interactive shells default to the API-key flow (Somansh's
call: "for the non-interactive option, I think we should keep api
key (for agents)"). The dispatcher routes to the API-key runner
when any of these hold:

  • stdin or stdout is not a TTY (CI runner, piped output, agent
    shell)
  • CI=true (or any truthy value)
  • HEYGEN_NONINTERACTIVE=1 (or any truthy value)

Explicit flags always skip the picker either way:

  • --api-key → read the key from stdin (existing behavior
    preserved)
  • --oauth → start the browser OAuth flow directly

The two flags are mutually exclusive.

Login flows

  • API key (API credits, default-recommended). Reads the key
    from stdin (interactive prompt with term.ReadPassword if stdin
    is a TTY, otherwise the pipe). Stored at ~/.heygen/credentials
    with mode 0600. On success, any co-located OAuth block is
    cleared.
  • Browser OAuth (subscription credits). Generates state + PKCE
    pair, starts a one-shot HTTP server on
    127.0.0.1:<port>/oauth/callback, opens
    https://app.heygen.com/oauth/authorize, exchanges the code at
    https://api2.heygen.com/v1/oauth/token, persists tokens, and
    prints Logged in as <username> from a one-shot /v3/users/me
    probe. On success, any co-located api_key is cleared.

Rest of the auth surface

  • Typed credential through the resolver. New auth.Credential
    type discriminates api_key / oauth / oauth-expired. The
    file resolver selects api_key first; if absent, it picks a fresh
    OAuth credential, then falls back to refresh-on-first-request
    for an expired access token with a refresh token. The legacy
    Resolve() string path is kept and refuses OAuth so older
    string-only call sites can never feed an access_token into the
    x-api-key header.
  • Bearer transport in client.Do. OAuth credentials send
    Authorization: Bearer <access_token>; api-key credentials
    still send x-api-key. Proactive refresh fires when within 60s
    of expiry, and a reactive refresh + retry fires once on a 401.
    A second 401 surfaces client.ErrReLoginNeeded, mapped by the
    executor to exit 3 with a heygen auth login hint. Refreshed
    tokens persist back via an injectable OAuthPersister.
    Auto-refresh inside Client.Do handles every rotation case end
    users hit — there is no user-facing heygen auth refresh
    command.
  • heygen auth logout — best-effort revoke at the IdP, then
    clears whatever credential is on disk (api_key or oauth or
    both, for pre-this-change users). The now-degenerate --all
    flag was dropped — there's only one credential to clear
    post-login.
  • Extended heygen auth status — existing /v3/users/me
    check plus a new credential block with type / source / scope
    / expires_at / refreshable. Api-key callers see strictly
    additive output.

Backward compatibility

  • HEYGEN_API_KEY env still takes precedence over everything
    else.
  • API-key users with no oauth block see zero behavior change.
  • Pre-this-change users with both blocks resolve to api_key
    (matches the picker recommendation) AND self-heal on next
    login.
  • Existing tests that ran auth login non-interactively (no
    -- flag) still work because the dispatcher routes them to
    the API-key flow under go test (no TTY). Tests that explicitly
    want OAuth exercise it via --oauth (or call runOAuthLogin
    directly through the existing test helper).
  • --device-code is reserved and returns "not yet supported"
    so the follow-up patch can wire it cleanly.

Testing

  • go test ./... -race -count=1 green.
  • go build ./... clean with CGO_ENABLED=0 (no new keychain
    deps).
  • go vet ./... clean.

New test coverage in this revision

  • TestResolveCredential_FileAPIKeyBeatsFileOAuth — with
    both blocks present, the resolver returns api_key (was OAuth).
    Legacy string Resolve() also returns the api_key (no longer
    refuses with "OAuth-aware transport required").
  • TestAuthLogin_APIKeyFlow_ClearsPreviousOAuth — api-key
    login on a file that already holds an OAuth block drops the
    OAuth block; stdout reports cleared_oauth=true.
  • TestAuthLogin_APIKeyFlow_NoPriorOAuth_NoMention
    first-time api-key login (no prior OAuth) doesn't mention
    OAuth in the success message; cleared_oauth=false.
  • TestAuthLogin_OAuthFlow_ClearsPreviousAPIKey — OAuth
    login on a file that already holds an api_key drops the
    api_key; stdout reports cleared_api_key=true.
  • TestAuthLogin_CredentialConflict_SelfHeals — full
    walkthrough for pre-this-change users: seed a file with both
    blocks, assert resolver picks api_key, run
    auth login --api-key, assert OAuth block is gone.
  • TestAuthLogout_ClearsBothBlocks — pre-this-change file
    with both blocks gets fully cleared on logout; JSON envelope
    reports cleared_credential=both.
  • TestAuthLogout_ClearsAPIKeyOnly — api-key-only file
    gets cleared on logout; message reads "Cleared stored API
    key", cleared_credential=api_key.
  • Existing picker tests pass with the flipped order — they
    asserted cursor positions, not which choice the cursor
    pointed at.

Prior test coverage (unchanged)

  • Resolver — typed OAuth selection paths (fresh / expired+refresh
    / expired-no-refresh / no-expiry / both-present); chain
    ResolveTypedCredential propagation; OAuthTokens store
    round-trip and clear semantics.
  • Transport — Bearer header wiring, 401-with-refresh single retry,
    proactive refresh near expiry, refresh-only credential triggers
    refresh before the first API hit, API-key path does NOT retry
    on 401, body replay survives the refreshed retry, sentinel
    ErrReLoginNeeded after the second 401. Concurrent Do()
    calls coalesce on a single IdP round-trip (race-detector
    verified); transient refresh failures surface as transport
    errors (not misclassified as ErrReLoginNeeded);
    non-replayable bodies on a 401 return the response unchanged
    instead of nil-panicking.
  • Integration — full loopback ← callback ← exchange chain against
    an httptest fake IdP + /v3/users/me. Token-exchange failure
    leaves disk untouched. HEYGEN_API_BASE is honored on the
    login identity probe so dev sandboxes don't fan out to
    api.heygen.com.
  • CLI — logout best-effort revoke + clear, auth status merge
    step handles a null upstream body without panicking.

Manual smoke

  • ./bin/heygen auth --help, auth login --help,
    auth logout --help all render correctly.
  • auth login --oauth --api-key is rejected with the cobra
    "mutually exclusive" error.

— Jerrai (https://claude.com/claude-code)

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/code-review @ 9a5b5de (PR 2: OAuth login + Bearer transport + logout/status, stacked on #191)

Verdict: LGTM-with-followups. No blockers. Strong test coverage (~1400 LoC tests vs ~1100 LoC production), good error contract design (typed Credential, ErrReLoginNeeded sentinel), careful header-handling discipline (Bearer vs x-api-key never both, reserved headers always win over --headers user input). Backward compatibility for API-key users preserved across all three precedence paths (env, file-JSON, file-legacy-plaintext). The 401-retry-with-refresh round-trip correctly replays POST bodies via req.GetBody, persists rotated refresh tokens, and surfaces a clear re-login sentinel after the second 401.

That said — the most consequential finding is that transient IdP failures get misclassified as permanent credential rejection, pushing the user toward re-login when a retry would fix it. The driver in PR 1 already exposes the discriminator (ErrRejected); client.go just doesn't check it. 12 findings (4 warn, 8 nit) below.

⚠️ Warnings

  1. Transient refresh failures misclassified as re-login-neededclient.go:188-191, 267-269. Any forceRefresh error (network blip, IdP 5xx, ctx cancellation, refresh-endpoint timeout) becomes ErrReLoginNeeded → executor shows "OAuth session expired — Run: heygen auth login". oauth.Client already exposes ErrRejected (set on 400/401 TokenError). Discriminate with errors.Is(refreshErr, oauth.ErrRejected): only that branch should become ErrReLoginNeeded; everything else should surface as a transient error so the user can retry. A flaky IdP currently feels like the CLI logged the user out.

  2. OAuth 401 retry can nil-panic on non-replayable request bodyclient.go:198. The retryTransport guards cloneRequestForRetry with canReplayBody(currentReq) (retry.go:60-62). The OAuth 401 retry path doesn't — calls cloneRequestForRetry unconditionally. cloneRequestForRetry calls req.GetBody() when req.Body != nil; any request whose Body was set without populating GetBody (custom callers, io.Pipe-backed streams, custom-buffer multipart) panics with nil deref. Latent today because the CLI's own builders use *bytes.Reader/*bytes.Buffer where stdlib sets GetBody — but Client.Do is exported and the contract should match the retryTransport's. Add if !canReplayBody(req) { return resp, nil } before the clone, or surface a typed 'body not replayable' error.

  3. mergeStatusEnvelope panics on null /v3/users/me responseauth_status.go:102-115. If /v3/users/me returns the JSON literal null, json.Unmarshal into *map[string]any succeeds with a nil map; then envelope["credential"] = credMeta panics with 'assignment to entry in nil map'. Fix is one line: if envelope == nil { envelope = map[string]any{} }. Verified locally.

  4. lookupCurrentUser ignores HEYGEN_API_BASEauth_login.go:234-269. The /v3/users/me probe hardcodes https://api.heygen.com. The CLI honors HEYGEN_API_BASE everywhere else via config.LayeredProvider.BaseURL(). Result: a user with HEYGEN_API_BASE pointed at a dev/staging env logs in (against prod IdP — also hardcoded, fair) but the probe hits prod with the prod-issued access_token; they see "Logged in." with no name, or worse, prod identity. Plumb the config provider into oauthLoginConfig.

🟡 Nits

  1. Concurrent Do() calls can race two simultaneous refreshesclient.go:248-271. ensureFreshOAuthToken reads credential state under credMu then releases the lock before calling forceRefresh. Two concurrent Do() calls both observing needsRefresh=true will both call forceRefresh, each consuming the (potentially-rotated) refresh_token; the second's refresh_token is already revoked by the time it lands → spurious ErrReLoginNeeded. Not a live bug — CLI usage is serial — but *http.Client itself supports concurrent Do(). Coalesce with sync.Once-per-window, singleflight, or an in-flight-refresh future. Matters once batched commands fan out.

  2. Near-expiry + still-401 triggers two refreshesclient.go:168-211. Token within 60s of expiry → ensureFreshOAuthToken refreshes (1st) → request goes out → server still 401s (clock skew, IdP-side propagation lag) → 401 retry branch fires another forceRefresh (2nd). Track whether ensureFreshOAuthToken already refreshed in this Do() call; if so, skip the 2nd refresh and go straight to ErrReLoginNeeded. Currently latent.

  3. Duplicated 60s skew constantsclient.go:22-25 + file_resolver.go:27-31. oauthRefreshSkew = 60s (client) and oauthExpiryClockSkew = 60s (auth) are intentionally identical; agreement enforced by reviewer attention rather than the type system. Promote to a single exported constant in internal/auth (e.g. auth.OAuthRefreshSkew) and reference from both sites. Drift between them would produce subtle 'resolver said fresh but transport refreshed anyway' debugging.

  4. Logout cleared_api_key reports the flag, not what was clearedauth_logout.go:77-81. A user with only an OAuth token who passes --all gets cleared_api_key: true despite there being no api_key. Either rename to clear_api_key_requested or branch on whether an api_key was present pre-clear.

  5. No resolver-level test for the 60s skew boundaryfile_resolver_test.go. Existing tests cover well-past-expiry (1h ago) and 1h-in-future; neither exercises the 60s skew window. The transport-side near-expiry test (TestClient_Do_OAuth_ProactiveRefresh_NearExpiry) covers the transport's skew; the resolver's identical-but-separate skew constant has no test guarding it. Pin both boundaries with a couple of nowFn-driven cases.

  6. PR description still references heygen auth refresh — removed in 9a5b5de. Update the 3rd bullet and the Testing section to match the current surface (login / logout / status) and note that auto-refresh inside Client.Do handles all rotation cases.

  7. CredentialTypeOAuthExpired has no applyHeaders caseclient.go:224-242. Safe today because Do() always calls ensureFreshOAuthToken first. Invariant is real but not enforced — a future caller of applyHeaders that skips ensureFresh would send a request with NO auth header. Either add an explicit panic/default case ('programmer bug: refresh was skipped') or factor applyHeaders' credential-write into a helper that asserts the type is one of {APIKey, OAuth}.

  8. RevokeToken silently swallows IdP 4xxoauth.go:178-208. Per RFC 7009 this is correct (revoke is fire-and-forget) and acceptable per the PR description. Worth surfacing the response status via a returned info struct or debug log though — a user debugging "why does my token keep working after logout" would benefit from knowing the IdP returned 401 invalid_token (already revoked) vs network blip vs misconfigured client_id. Consider plumbing a (statusCode, error) or structured Result.

— Jerrai

jrusso1020 added a commit that referenced this pull request Jun 24, 2026
W1: discriminate transient refresh failures from oauth.ErrRejected
    (re-login). Network/5xx/timeout/ctx-cancel now surface with the
    underlying error wrapped as "oauth: refresh failed: ..." instead
    of misclassifying as ErrReLoginNeeded.
W2: guard the OAuth 401 retry with canReplayBody before
    cloneRequestForRetry — non-replayable bodies return the 401
    response unchanged instead of nil-panicking on req.GetBody.
W3: nil-safe map init in mergeStatusEnvelope so a `null` /v3/users/me
    response doesn't panic on the credential assignment.
W4: plumb HEYGEN_API_BASE through lookupCurrentUser via the existing
    config.LayeredProvider — the OAuth login identity probe was
    hardcoded to api.heygen.com.
N1: coalesce concurrent refresh via per-Client sync.Mutex + double-
    checked expiry. Two parallel Do() calls now share a single
    IdP round-trip; no singleflight dep needed.
N2: track in-Do-refresh state to skip a double refresh when a
    proactively-refreshed token still returns 401 — straight to
    ErrReLoginNeeded.
N3: hoist the 60s skew constant to internal/auth as the canonical
    OAuthRefreshSkew. Both the resolver and the transport read it
    from one place now.
N4: rename the logout JSON field cleared_api_key →
    clear_api_key_requested. The field reports the --all flag, not
    whether an api_key was actually present pre-clear.
N7: explicit panic on CredentialTypeOAuthExpired in applyHeaders so
    a future change that skips the refresh gate surfaces as a clear
    programmer-bug message instead of sending a stale Bearer header.

Per James's call on PR #192's review findings.

— Jerrai (https://claude.com/claude-code)
jrusso1020 added a commit that referenced this pull request Jun 24, 2026
W1: discriminate transient refresh failures from oauth.ErrRejected
    (re-login). Network/5xx/timeout/ctx-cancel now surface with the
    underlying error wrapped as "oauth: refresh failed: ..." instead
    of misclassifying as ErrReLoginNeeded.
W2: guard the OAuth 401 retry with canReplayBody before
    cloneRequestForRetry — non-replayable bodies return the 401
    response unchanged instead of nil-panicking on req.GetBody.
W3: nil-safe map init in mergeStatusEnvelope so a `null` /v3/users/me
    response doesn't panic on the credential assignment.
W4: plumb HEYGEN_API_BASE through lookupCurrentUser via the existing
    config.LayeredProvider — the OAuth login identity probe was
    hardcoded to api.heygen.com.
N1: coalesce concurrent refresh via per-Client sync.Mutex + double-
    checked expiry. Two parallel Do() calls now share a single
    IdP round-trip; no singleflight dep needed.
N2: track in-Do-refresh state to skip a double refresh when a
    proactively-refreshed token still returns 401 — straight to
    ErrReLoginNeeded.
N3: hoist the 60s skew constant to internal/auth as the canonical
    OAuthRefreshSkew. Both the resolver and the transport read it
    from one place now.
N4: rename the logout JSON field cleared_api_key →
    clear_api_key_requested. The field reports the --all flag, not
    whether an api_key was actually present pre-clear.
N7: explicit panic on CredentialTypeOAuthExpired in applyHeaders so
    a future change that skips the refresh gate surfaces as a clear
    programmer-bug message instead of sending a stale Bearer header.

Per James's call on PR #192's review findings.

— Jerrai (https://claude.com/claude-code)
@jrusso1020 jrusso1020 force-pushed the feat/oauth-cli-surface branch from d88f662 to d57a35d Compare June 24, 2026 18:38
@somanshreddy

Copy link
Copy Markdown
Collaborator

Took another pass now that the picker landed. The dispatch and non-interactive default read cleanly, and the piping regression is fixed and guarded by a test. Nice work. A few transport-layer follow-ups below, all in internal/client/client.go and none blocking the design.

1. Refresh path concurrency (the main one). Two related issues on the OAuth refresh path, same fix:

  • The reactive 401 refresh calls forceRefresh(ctx) directly, not under refreshMu. Only the proactive path (ensureFreshOAuthToken) takes the lock + re-checks. So two concurrent Do() calls that both 401 can both refresh. If the IdP issues single-use rotating refresh tokens, the second refresh uses an already-consumed token, fails with ErrRejected, and that goroutine returns a spurious ErrReLoginNeeded even though the shared credential is now healthy. A burst of requests on an expired token has the same retry-storm shape.
  • Separately, the c.credential.IsOAuth() / HasRefreshToken() reads in Do() are unlocked, while forceRefresh mutates c.credential under credMu. That's a data race under -race. The existing concurrency test (TestClient_Do_OAuth_ConcurrentRefresh_Coalesces) only drives the proactive path, so it doesn't cover the reactive reads racing a write.

Suggested fix: route the reactive refresh through the same refreshMu + post-lock re-check as the proactive path, and read credential state in Do() via a credMu-guarded snapshot helper rather than touching c.credential directly. A CLI is often single-threaded, but the client is explicitly used concurrently (the N1 test), so this is worth closing.

2. Refresh-token persistence failure is swallowed. forceRefresh does _ = c.oauthPersister.SaveOAuthTokens(...). If the IdP rotates the refresh token and that disk write fails, the on-disk refresh token is now dead and the new one was never saved, so the next CLI invocation is stranded into a re-login with no signal why. The in-memory token is fine for the current process, but the failure surfaces later and is hard to attribute. Suggest surfacing it (stderr warning at minimum, or a typed warning on the result) instead of dropping it.

3. Stale NewWithCredential docstring. The doc + inline comment say the constructor "panics with a clear error" / "fails fast" if an OAuth credential is missing its oauth.Client, but the code silently defaults to oauth.NewClient() (the live IdP). Either restore the fail-fast or fix the comment. As written, the comment's stated goal (don't tie the test transport to the live IdP) is defeated: a test that forgets WithOAuthClient silently hits prod.

Minor / nits

  • Explicit --oauth in a headless / non-TTY shell still blocks ~5 min on the loopback before timing out. The no-flag path is now protected (defaults to API key), but --oauth goes straight to runOAuthLogin. A quick "no TTY/browser, the callback can't land" fail-fast would beat the timeout.
  • needsRefresh uses time.Now() directly while the resolver uses an overridable nowFn and the oauth client has WithNow. Fine in prod, just less consistent for deterministic tests.

(PR 191's driver looks solid; left one tiny doc nit over there.)

jrusso1020 added a commit that referenced this pull request Jun 24, 2026
S1: route reactive 401 refresh through refreshMu + add credMu-guarded
    snapshot helper for credential reads in Do() — kills the data
    race + the rotating-token consumption window. Adds
    refreshIfTokenUnchanged that compares the sent access-token to
    the in-memory one under the lock so two concurrent 401s coalesce
    to a single IdP round-trip.
S2: surface forceRefresh persistence failures to stderr (via a
    configurable warn writer) instead of silently swallowing —
    stranded-token attribution.
S3: restore fail-fast on NewWithCredential when OAuth credential
    has no oauth.Client (was the original intent of the
    test-transport contract). Panics with a clear hint about
    WithOAuthClient.
N1: --oauth in headless shell (no stdin TTY + HEYGEN_NO_BROWSER=1
    or BROWSER=none) fails fast instead of blocking ~5min on the
    loopback timeout.
N2: thread nowFn through needsRefresh for deterministic tests (new
    WithNow option on Client; defaults to time.Now).

Per Somansh's review on #192.

— Jerrai (https://claude.com/claude-code)

@somanshreddy somanshreddy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Re-reviewed after the latest round and every finding is addressed and code-verified, each with a test:

  • Refresh-path race + reactive coalescing: snapshotCredential() (credMu-guarded reads) + refreshIfTokenUnchanged (reactive 401 now under refreshMu with a sent-token re-check). Traced the two-concurrent-401 case; single IdP round-trip, no rotating-token double-consume.
  • NewWithCredential now fails fast (panic) when an OAuth credential is built without WithOAuthClient; docstring matches behavior.
  • Refresh-token persist failure now warns to stderr instead of being swallowed.
  • Bonus: WithNow injectable clock, and explicit --oauth in a headless shell now fails fast instead of hanging on the loopback.

The two design calls also landed well for an agent-first CLI: API key is the default (and unlabeled) picker choice, and single-credential normalization (a login clears the other block, surfaced in the output) removes the silent dual-credential billing ambiguity. CI green across linux/mac/win. Nice work.

Base automatically changed from feat/oauth-driver to main June 25, 2026 05:10
@jrusso1020 jrusso1020 enabled auto-merge (squash) June 25, 2026 05:11
jrusso1020 and others added 12 commits June 26, 2026 03:13
PR 2 of the OAuth stack: wires the PR 1 oauth driver into the credential
resolver, the HTTP transport, and a new CLI surface. Stacked on top of
PR 1 (#191); targets feat/oauth-driver so the diff shows only PR 2's
delta and auto-rebases when PR 1 merges.

Credential resolver
- New `Credential` type discriminates api_key / oauth / oauth-expired.
- `FileCredentialResolver.ResolveCredential()` returns the typed form;
  the legacy string `Resolve()` is kept for backwards compat and
  refuses OAuth so older string-only call sites can never feed an
  access_token into the x-api-key header.
- New on-disk helpers `SaveOAuthTokens` / `LoadOAuthTokens` /
  `ClearOAuthTokens` round-trip the existing `oauth` block.
- Selection order when both are present: fresh OAuth wins; expired
  OAuth + refresh_token falls back to refresh-on-first-request; expired
  OAuth + no refresh falls back to api_key.

Client transport
- `client.NewWithCredential(cred, opts...)` takes the typed credential.
  `client.New(apiKey)` is preserved as a thin wrapper.
- `Do` sets `Authorization: Bearer <access_token>` for OAuth, otherwise
  `x-api-key: <key>`. Reserved-header precedence is unchanged.
- Proactive refresh: when the OAuth token is within 60s of expiry the
  transport refreshes before the request. Refresh-only credentials
  (OAuthExpired) refresh before the first request.
- Reactive refresh: a 401 with OAuth + refresh_token triggers a single
  refresh + retry. Second 401 returns `client.ErrReLoginNeeded`, which
  the executor maps to exit-3 with a `heygen auth login` hint.
- Refreshed tokens persist back via an injectable `OAuthPersister`
  (defaults to disk; tests override).
- The retry path replays request bodies via `GetBody`.

CLI surface
- `heygen auth login` — default browser + PKCE + loopback flow. Opens
  https://app.heygen.com/oauth/authorize, captures the code on
  127.0.0.1:<port>/oauth/callback, exchanges with
  https://api2.heygen.com/v1/oauth/token, persists tokens, and prints
  "Logged in as <username>" pulled from /v3/users/me.
  - `--api-key` retains the previous stdin/prompt API-key path so
    existing automation keeps working.
  - `--device-code` is reserved and returns "not yet supported".
- `heygen auth logout` — best-effort revoke at the IdP then clears the
  `oauth` block. `--all` also clears `api_key`; when both are wiped the
  credentials file is removed.
- `heygen auth refresh` — uses the stored refresh_token to mint a new
  access_token + persist. Returns exit 3 when no OAuth session exists.
- `heygen auth status` — existing /v3/users/me check, now also emits a
  `credential` block with type / source / scope / expires_at /
  refreshable. Api-key callers see strictly additive output.

Backward compatibility
- HEYGEN_API_KEY env still takes precedence over everything else.
- API-key users with no oauth block see zero behavior change.
- A co-located api_key is preserved on OAuth login + logout (the
  resolver just stops selecting it while a fresh OAuth credential is
  present).

Tests
- Resolver: typed OAuth selection (fresh / expired / no-expiry / both
  present), chain typed-credential propagation, OAuth store round-trip
  + clear semantics.
- Transport: Bearer header wiring, 401-retry-with-refresh, proactive
  refresh near expiry, refresh-only credential triggers refresh before
  first request, API-key path does NOT retry on 401, body replay on
  the refreshed retry, sentinel ErrReLoginNeeded after the second 401.
- Integration: full loopback ← callback ← exchange chain against an
  httptest fake IdP + /v3/users/me. Token-exchange failure leaves disk
  untouched.
- CLI: logout best-effort revoke + clear, --all wipes the file,
  refresh against an httptest IdP, refresh-not-logged-in returns
  exit 3, refresh rejection surfaces an auth error.
- Race-detector clean.

Reported-by: James Russo
Co-Authored-By: Jerrai <noreply@anthropic.com>
staticcheck SA9003: the `if _, _ = stderr.WriteString(err.Error()); false`
form is an empty-branch dead statement. Simplify to a bare write — same
behavior, lint-clean.

Co-Authored-By: Jerrai <noreply@anthropic.com>
NTFS does not honor POSIX permission bits, so `os.Chmod(0o600)`
through the credentials-file writer produces 0o666 on Windows runners.
Match the existing TestFileCredentialStore_FilePermissions pattern and
skip the assertion when runtime.GOOS == "windows".

Co-Authored-By: Jerrai <noreply@anthropic.com>
The proactive 60s-window refresh inside client.Do plus the single 401-retry-with-refresh
covers every case where a stored OAuth access_token needs rotation. A user-facing
`heygen auth refresh` command is debug/cron-only surface that adds help-output noise
without serving a real workflow.

Users who truly need to force a token rotation can `heygen auth logout` then
`heygen auth login`. Logout's revoke-and-clear plus login's fresh PKCE dance
produces the same effect.

Per James's call on the OAuth stack PR thread.

— Jerrai (https://claude.com/claude-code)
W1: discriminate transient refresh failures from oauth.ErrRejected
    (re-login). Network/5xx/timeout/ctx-cancel now surface with the
    underlying error wrapped as "oauth: refresh failed: ..." instead
    of misclassifying as ErrReLoginNeeded.
W2: guard the OAuth 401 retry with canReplayBody before
    cloneRequestForRetry — non-replayable bodies return the 401
    response unchanged instead of nil-panicking on req.GetBody.
W3: nil-safe map init in mergeStatusEnvelope so a `null` /v3/users/me
    response doesn't panic on the credential assignment.
W4: plumb HEYGEN_API_BASE through lookupCurrentUser via the existing
    config.LayeredProvider — the OAuth login identity probe was
    hardcoded to api.heygen.com.
N1: coalesce concurrent refresh via per-Client sync.Mutex + double-
    checked expiry. Two parallel Do() calls now share a single
    IdP round-trip; no singleflight dep needed.
N2: track in-Do-refresh state to skip a double refresh when a
    proactively-refreshed token still returns 401 — straight to
    ErrReLoginNeeded.
N3: hoist the 60s skew constant to internal/auth as the canonical
    OAuthRefreshSkew. Both the resolver and the transport read it
    from one place now.
N4: rename the logout JSON field cleared_api_key →
    clear_api_key_requested. The field reports the --all flag, not
    whether an api_key was actually present pre-clear.
N7: explicit panic on CredentialTypeOAuthExpired in applyHeaders so
    a future change that skips the refresh gate surfaces as a clear
    programmer-bug message instead of sending a stale Bearer header.

Per James's call on PR #192's review findings.

— Jerrai (https://claude.com/claude-code)
PR 1 (#191) changed BuildAuthorizationURL to return (string, error)
with precondition checks on state/codeChallenge/redirectURI. Update
the login call site to thread the error through as a CLI error.

Surfaces the same shape as the other oauth.* errors in this file
(`oauth: build authorize URL: <reason>`).

— Jerrai (https://claude.com/claude-code)
Per the team discussion on #heygen-cli: present both auth options when
running interactively; default to API-key on non-interactive shells
(CI, agents, piped stdin, HEYGEN_NONINTERACTIVE=1).

`heygen auth login` (TTY): TUI picker — OAuth (subscription credits,
recommended) vs API key (API credits). Arrow keys + Enter; j/k vim
motions, 1/2 number shortcuts, Esc/Ctrl-C to cancel.

`heygen auth login` (non-TTY): API-key stdin/prompt, same shape as the
existing `--api-key` path. Triggered by either stdin/stdout not being
a TTY, CI=true, or HEYGEN_NONINTERACTIVE=1.

`--oauth` / `--api-key` flags still skip the picker either way; the
two flags are mutually exclusive.

Picker is a small Bubble Tea model (deps already in go.mod from the
poll spinner). Window resizes are harmless — the View is short enough
to fit any sensible terminal and we never read width to truncate.

Tests:
- Dispatch table: every flag + TTY + env combination routes to the
  right runner with the right number of picker invocations.
- Picker model: arrow keys (with wraparound), j/k, numeric shortcuts,
  Enter selection, Esc/Ctrl-C cancel, WindowSizeMsg doesn't panic,
  post-exit View() is blank, credit-billing notes render.
- Integration: `auth login` with no flag on non-TTY stdin actually
  writes the API key to ~/.heygen/credentials.

— Jerrai (https://claude.com/claude-code)
S1: route reactive 401 refresh through refreshMu + add credMu-guarded
    snapshot helper for credential reads in Do() — kills the data
    race + the rotating-token consumption window. Adds
    refreshIfTokenUnchanged that compares the sent access-token to
    the in-memory one under the lock so two concurrent 401s coalesce
    to a single IdP round-trip.
S2: surface forceRefresh persistence failures to stderr (via a
    configurable warn writer) instead of silently swallowing —
    stranded-token attribution.
S3: restore fail-fast on NewWithCredential when OAuth credential
    has no oauth.Client (was the original intent of the
    test-transport contract). Panics with a clear hint about
    WithOAuthClient.
N1: --oauth in headless shell (no stdin TTY + HEYGEN_NO_BROWSER=1
    or BROWSER=none) fails fast instead of blocking ~5min on the
    loopback timeout.
N2: thread nowFn through needsRefresh for deterministic tests (new
    WithNow option on Client; defaults to time.Now).

Per Somansh's review on #192.

— Jerrai (https://claude.com/claude-code)
Per James's call on #heygen-cli — heygen-cli is agent-first, not
user-first like Claude Code, so API key is the dominant use case:

- *TUI picker*: API key in position 1, marked Recommended. OAuth
  remains a one-arrow-down opt-in for users who want subscription
  pricing.
- *File-level resolver precedence*: HEYGEN_API_KEY env > file
  api_key > file OAuth. Was env > OAuth > api_key. Existing users
  with only one credential are unaffected; users who somehow have
  both will resolve to api_key (matches the picker recommendation).
- *Single-credential normalization*: on successful auth login,
  clear the other credential block. The file now holds at most
  one of api_key / oauth, never both. Pre-this-change users with
  both blocks self-heal on their next login.

Side cleanup: dropped the now-degenerate `--all` flag on
`auth logout` (there's only one credential to clear post-login).

— Jerrai (https://claude.com/claude-code)
Per James's call on #heygen-cli: position-1 + description already
communicates the default; an explicit "(Recommended)" label is more
prescriptive than the picker needs to be.

API key stays in position 0 (cursor default). OAuth stays
one-arrow-down. No behavioral change beyond the label text.

— Jerrai (https://claude.com/claude-code)
Matching the picker change — no prescriptive 'recommended' tag on either
auth path in the help output.

— Jerrai (https://claude.com/claude-code)
Per Somansh's call on #heygen-cli — the single-credential
normalization needs to be visible to clients before they re-login
and lose their other credential.

Three surfaces touched:
1. heygen auth --help: replaced the "holds at most one ... clears
   the other on success" sentence with an explicit IMPORTANT
   callout naming both directions.
2. heygen auth login --help: same shape — IMPORTANT bullet list
   replaces the prior "single-credential normalization" paragraph
   that read more like a design note than a user warning.
3. README.md Authenticate + Configuration: added the 5-option
   table (env / pipe / --api-key / --oauth / picker), a blockquote
   "Single-credential file" note explaining the replace behavior,
   and updated the credentials-file row to mention "API key or
   OAuth tokens (one at a time)."

No code behavior changes — just doc prominence.

— Jerrai (https://claude.com/claude-code)
@jrusso1020 jrusso1020 force-pushed the feat/oauth-cli-surface branch from 9f7ff33 to 0ed96ce Compare June 26, 2026 03:15
@jrusso1020 jrusso1020 merged commit 4d458ae into main Jun 26, 2026
9 checks passed
@jrusso1020 jrusso1020 deleted the feat/oauth-cli-surface branch June 26, 2026 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants