Skip to content

Conversation

@saintstack
Copy link
Contributor

  • k8s/agent-scaler/agent-scaler.sh
    Add --ignore-not-found=true to delete. Cleans up some complaint when
    two scripts running beside each other and one deletes first (happens
    when testing changes to this script). Minor item.

Also, clean up 'Failed' jobs else they just hang out.

Here are example Failed jobs.


joshua-agent-250604164239-10             Failed     0/1           3h3m       3h3m
joshua-agent-250604164415-38             Failed     0/1           3h2m       3h2m
joshua-agent-250604164522-70             Failed     0/1           3h1m       3h1m


# cleanup explicitly Failed jobs
# Filter by AGENT_NAME and job status condition "Failed"="True"
for job in $(kubectl get jobs -n "${namespace}" -o jsonpath='{range .items[?(@.status.conditions[*].type=="Failed" && @.status.conditions[*].status=="True")]}{.metadata.name}{"\\n"}{end}' 2>/dev/null | { grep -E "^${AGENT_NAME}-[0-9]+(-[0-9]+)?$" || true; }); do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get a bit tricky in bash, but I think we should delay the deletion of those failed jobs by 1 day to give some time for debugging (I'm not sure if we store the logs of the failed jobs somewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang. I forgot about this issue. Cleaning up failed jobs would have prevented the runaway use of pods that I have been trying to 'fix' these last few days #124. If this PR had been in place, there would have been less old jobs claiming they were in need of servicing because the FAILED tasks would have been cleaned up. I pushed a change that will keep FAILED around for a day (good idea) but I need to deploy it to test it (I tested pieces of the change but...). Will do it after #124 goes in. Thanks @johscheuer

johscheuer
johscheuer previously approved these changes Jan 8, 2026
Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thought the agent scaler gets a bit messy with all this bash :)

@saintstack
Copy link
Contributor Author

LGTM 👍 Thought the agent scaler gets a bit messy with all this bash :)

Yeah. Should probably replace it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants