Skip to content

Comments

Sched core implementation#6242

Draft
pditommaso wants to merge 45 commits intomasterfrom
sched
Draft

Sched core implementation#6242
pditommaso wants to merge 45 commits intomasterfrom
sched

Conversation

@pditommaso
Copy link
Member

Draft implementation for sched poc

@pditommaso pditommaso marked this pull request as draft July 2, 2025 08:31
@netlify
Copy link

netlify bot commented Jul 2, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit f52470a
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/699781f2377ed00008debc86

@bentsherman
Copy link
Member

I wonder if we can fold nf-tower into nf-seqera with subpackages io.seqera.platform and io.seqera.scheduler, SeqeraPlatformClient and SeqeraSchedulerClient, etc. Of course that can be done when we're about to merge

@pditommaso
Copy link
Member Author

Yeah, this is temporary. We to find a better packing.

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
pditommaso and others added 4 commits January 17, 2026 11:39
…ropagation

Wrap cancelTask call in try-catch to prevent exceptions from escaping
killTask(). This ensures handleException() is called with the original
error, allowing proper workflow termination instead of crashing the
task monitor poll loop.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Add keep-alive mechanism to send empty submissions after 60s idle
- Pass sessionId to batch submitter for API calls
- Add null check in killTask when taskId not yet assigned
- Update sched-client to 0.9.0-SNAPSHOT
- Bump plugin version to 0.4.0

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
pditommaso and others added 12 commits January 26, 2026 11:35
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Define SEQERA constant in SeqeraExecutor for reuse
- Override executor name in trace record to 'seqera/aws' for cost tracking
- Add unit test for getTraceRecord() method
- Remove obsolete convertPriceModel test (moved to MapperUtil)

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
… [ci fast]

- Add nf-seqera plugin to packing.gradle and plugins-info.txt
- Fix workflow hang when batch submission fails by detecting early
  failure in checkIfCompleted()

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Add proper error handling to prevent thread crashes and enable graceful
workflow abort on fatal errors:

- Add onError callback to propagate fatal errors to session
- Wrap sendTasks0 loop in try-catch to handle unexpected Throwables
- Protect keep-alive calls to prevent thread crash on transient failures
- Add drainAndFailPendingTasks helper to notify pending tasks on failure
- Update SeqeraExecutor to pass session.abort callback

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
…xit code [ci fast]

When a Seqera task fails (status=FAILED) but no exit code is available
(exitStatus=MAX_VALUE), set task.error to a ProcessException with the
error message from the task state, or a fallback message if not available.

This aligns with the error handling pattern used in AWS Batch and Google
Batch executors.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
…onfig warnings

The config validator checks @ConfigOption FIRST - if present, it treats
the field as a leaf Option and doesn't recurse into nested ConfigScope
types. This caused spurious warnings for machineRequirement nested options:

    WARN: Unrecognized config option 'seqera.machineRequirement.provisioning'
    WARN: Unrecognized config option 'seqera.machineRequirement.maxSpotAttempts'

Removing @ConfigOption allows the validator to recognize that
MachineRequirementOpts implements ConfigScope and properly discover
its nested @ConfigOption fields.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
- Add disk configuration options to MachineRequirementOpts (diskType, diskThroughputMiBps, diskIops, diskEncrypted)
- Extend DiskResource with cloud-specific properties (iops, throughput, encrypted, filesystem, mountPath)
- Add MapperUtil.toDiskRequirement() to map disk settings to sched API
- Update sched-client to 0.14.0-SNAPSHOT for DiskRequirement support
- Document EBS disk configuration in executor.md and config.md
- Default to gp3 volumes with 325 MiB/s throughput (Fusion recommended)

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Add diskAllocation config option (task/node) to MachineRequirementOpts
- Update MapperUtil to map allocation and use correct volumeType API
- Add validation: node allocation only supports disk size
- Bump sched-client to 0.16.0-SNAPSHOT

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
pditommaso and others added 4 commits January 30, 2026 12:56
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Validate that the container image is not empty before submitting tasks
to the Seqera scheduler. The executor requires all processes to specify
a container image, so this throws ProcessUnrecoverableException with a
clear error message when the container is missing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Add configurable labels that are propagated to AWS resources (ECS tasks,
capacity providers, EC2 instances) for cost tracking and resource organization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Make ExecutorOpts implement ConfigScope and remove @ConfigOption from
the executor field in SeqeraConfig so nested config options are properly
discovered by the config validator.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
pditommaso and others added 24 commits February 6, 2026 21:58
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
…fast]

Write the Platform-assigned workflowId into WorkflowMetadata via a new
PlatformMetadata class so the Seqera scheduler can correlate jobs back
to Platform runs.

- Add PlatformMetadata with lazy-init getter on WorkflowMetadata
- TowerClient.onFlowCreate() writes workflowId to platform metadata
- Labels emits seqera.io/platform/workflowId label

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
…nputFilesProfiler

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Adds `seqera.executor.predictionModel` option to enable per-run
resource estimation. Validates against supported models (currently
"qr/v1") at config time. Passed to CreateRunRequest on run creation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
…entual consistency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Propagate the Platform watch URL alongside workflowId so the scheduler
can link back to the Platform monitoring page for each workflow run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Allow custom environment variables for tasks via `seqera.executor.taskEnvironment`.
Variables are merged with Fusion env, with Fusion taking precedence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
…arge prefixes

The lookup method paginated through all objects under an S3 prefix
(maxKeys=250) to check path existence. On prefixes with millions of
objects this caused the main thread to hang for minutes parsing massive
XML responses.

Observed in production: nf-schema parameter validation calls
Files.exists() on an S3 outdir path, which triggers
S3ObjectSummaryLookup.lookup. With a large prefix like
s3://bucket/results containing many objects from previous runs,
the pagination loop iterated indefinitely.

Fix: use maxKeys=2 and remove pagination. The matchName check only
needs to find the exact key or its first child (key + "/"), which
are guaranteed to appear in the first results due to S3 lexicographic
ordering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants