Skip to content

[RayService][Bug] applyServeTargetCapacity: accept int64/float64 for cached target_capacity (#4777)#4778

Open
SAY-5 wants to merge 3 commits intoray-project:masterfrom
SAY-5:fix/serve-target-capacity-int64-4777
Open

[RayService][Bug] applyServeTargetCapacity: accept int64/float64 for cached target_capacity (#4777)#4778
SAY-5 wants to merge 3 commits intoray-project:masterfrom
SAY-5:fix/serve-target-capacity-int64-4777

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented Apr 28, 2026

Closes #4777.

Why

applyServeTargetCapacity reads the cached ServeConfigV2 string,
unmarshals it with k8s.io/apimachinery/util/yaml, and asserts
serveConfig["target_capacity"].(float64) so it can skip
UpdateDeployments when the cached value already matches the goal.

util/yaml decodes plain integer scalars as int64 (it routes
JSON-compatible input through encoding/json with UseNumber off,
and gopkg.in/yaml.v2 also returns int64 for integer scalars). The
float64 assertion therefore always failed for any
target_capacity written as a plain integer, the idempotency branch
was always skipped, and the controller called UpdateDeployments on
every reconcile even when the value was already current.

The existing TestReconcileServeTargetCapacity didn't catch this
because every case used a cached/goal mismatch (0 → 30 / 60 → 30
etc.), so the controller proceeded to the update branch in both the
intended-skip and intended-update flows — masking the bug entirely.

Fix

  • New targetCapacityAsInt32(any) (int32, bool) helper that accepts
    int64, float64, int, and int32, returning int32. Rejects
    nil / string / map / other types.
  • applyServeTargetCapacity now uses the helper for the idempotency
    check.

Tests

ray-operator/controllers/ray/rayservice_controller_unit_test.go:

  • TestApplyServeTargetCapacity_SkipsUpdateWhenCachedMatchesGoal
    — the cached/goal scenario the previous suite never covered. Cached
    ServeConfigV2 = '{"target_capacity": 60}' and goal 60 must NOT
    call UpdateDeployments. Asserts fakeDashboard.LastUpdatedConfig
    stays empty.
  • TestTargetCapacityAsInt32 — 7 sub-cases covering int64,
    float64, int, int32, nil, string, map.
  • The existing TestReconcileServeTargetCapacity still passes
    unchanged.

Verification

  • go test ./controllers/ray/ -run "TestApplyServeTargetCapacity_SkipsUpdateWhenCachedMatchesGoal|TestTargetCapacityAsInt32|TestReconcileServeTargetCapacity" → all green.
  • I confirmed locally that reverting just the controller change
    while keeping the test makes the new
    TestApplyServeTargetCapacity_SkipsUpdateWhenCachedMatchesGoal
    fail with Should be empty, but was [123 34 116 97 ... 54 48 125]
    — the byte representation of {"target_capacity":60} that
    UpdateDeployments should never have been called with — i.e. the
    test catches the regression.

Copy link
Copy Markdown

@chenshi5012 chenshi5012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , FIX so quickly

@JiangJiaWei1103 JiangJiaWei1103 self-assigned this Apr 29, 2026
@Future-Outlier
Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@JiangJiaWei1103
Copy link
Copy Markdown
Member

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 9341b20. Configure here.

…cached target_capacity (ray-project#4777)

Closes ray-project#4777.

`applyServeTargetCapacity` reads the cached `ServeConfigV2` string,
unmarshals it via `k8s.io/apimachinery/util/yaml`, and asserts
`serveConfig["target_capacity"].(float64)` to skip
`UpdateDeployments` when the cached value already matches the goal.

`util/yaml` decodes plain integer scalars as `int64` (it routes
JSON-compatible input through encoding/json with UseNumber off, and
`gopkg.in/yaml.v2` also returns int64 for integer scalars). The
`float64` assertion therefore always failed, the idempotency
branch was always skipped, and `UpdateDeployments` ran on every
reconcile even when the value was already current.

Existing `TestReconcileServeTargetCapacity` did not catch this
because every case used a cached/goal mismatch (0 → 30 / 60 → 60),
so the controller proceeded to the update branch in both the
intended-skip and intended-update flows — masking the bug.

Fix
- New `targetCapacityAsInt32(any) (int32, bool)` helper that
  accepts int64, float64, int, and int32 and returns int32; rejects
  nil/string/map/etc.
- `applyServeTargetCapacity` now uses the helper for the
  idempotency check.

Tests
- New `TestApplyServeTargetCapacity_SkipsUpdateWhenCachedMatchesGoal` —
  the cached/goal scenario the previous suite never covered. Cached
  ServeConfigV2 with `target_capacity: 60` and goal 60 must NOT
  call `UpdateDeployments`. Asserts `fakeDashboard.LastUpdatedConfig`
  stays empty.
- New `TestTargetCapacityAsInt32` — 7 sub-cases covering int64,
  float64, int, int32, nil, string, map.
- Existing `TestReconcileServeTargetCapacity` still passes
  (unchanged behaviour for cached/goal mismatch paths).

Verified locally: all subtests pass with the fix, and reverting just
the controller change makes the new
`TestApplyServeTargetCapacity_SkipsUpdateWhenCachedMatchesGoal`
fail with `Should be empty, but was [123 34 116 97 114 103 101 116 95 99 97 112 97 99 105 116 121 34 58 54 48 125]` — the encoded
`{"target_capacity":60}` body that `UpdateDeployments` should
never have been called with.

Signed-off-by: SAY-5 <say.apm35@gmail.com>
@SAY-5 SAY-5 force-pushed the fix/serve-target-capacity-int64-4777 branch from 9341b20 to e4d3f64 Compare April 29, 2026 18:16
@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented Apr 29, 2026

Pushed an update that addresses the gosec G115 lint failures from the previous CI run. The two flagged conversions in targetCapacityAsInt32 (rayservice_controller.go:1414, :1418) used unconditional int32(n) casts; gosec's concern was that an int64 / int payload from a YAML / JSON document could overflow.

Now targetCapacityAsInt32 range-checks against math.MinInt32 / math.MaxInt32 (and rejects NaN / Inf for float64) before the conversion, returning ok=false on out-of-range values rather than silently truncating. The target_capacity semantic is 0–100 in practice, so any out-of-range value is malformed input that should fall through to the existing "unsupported value" log path instead of a garbage int32 hitting the dashboard.

Test TestTargetCapacityAsInt32 gains seven new cases pinning the rejection of int64/int/float64 above MaxInt32, below MinInt32, and the float64 special values NaN / +Inf / -Inf. The original 7 cases remain. Local go test -run TestTargetCapacityAsInt32 is 14/14 green; go vet ./... and go build ./... are both clean.

Also rebased onto current master to clear the staleness.

SAY-5 added 2 commits May 2, 2026 11:17
…versions

Signed-off-by: SAY-5 <say.apm35@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants