Skip to content

Commit 8d49cff

Browse files
authored
fix(sdk): fix race condition when checkpoint completion happens before waitForStatusChange is called (#395)
*Issue #, if available:* *Description of changes:* There is a race condition in waitForStatusChange where if the operation completes before this is called, it will not get the status and just terminate without updating the data correctly. This results in the error `Cannot return PENDING status with no pending operations.` To fix this, if the operation status is terminal in waitForStatusChange or waitForRetryTimer, we resolve instantly instead of creating a promise and waiting. This can happen since waitForStatusChange is called asynchronously, so there can be time between when the phase 2 promise starts and when the last checkpoint data was updated. I disabled time skipping in wait-for-callback-serdes.test.ts which consistently shows this (when running locally). In the cloud tests it's harder to reproduce since we poll for the callback ID every second and we don't get the data instantly, but it could still happen in rare cases. I also added this same safeguard for waitForRetryTimer. This could happen if the main thread is blocked for a long time somehow and it fails to call waitForRetryTimer in time. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
1 parent f161141 commit 8d49cff

File tree

4 files changed

+538
-0
lines changed

4 files changed

+538
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import { InvocationType } from "@aws-sdk/client-lambda";
2+
import { handler } from "./wait-for-callback-quick-completion";
3+
import { createTests } from "../../../utils/test-helper";
4+
5+
createTests({
6+
handler,
7+
invocationType: InvocationType.Event,
8+
localRunnerConfig: {
9+
skipTime: false,
10+
},
11+
tests: (runner, { isCloud }) => {
12+
it("should handle waitForCallback when callback completes before ", async () => {
13+
const callbackOp = runner.getOperationByIndex(0);
14+
15+
const executionPromise = runner.run({
16+
payload: isCloud
17+
? {
18+
submitterDelay: 5,
19+
}
20+
: {},
21+
});
22+
23+
// only wait for started status
24+
await callbackOp.waitForData();
25+
26+
await callbackOp.sendCallbackSuccess("{}");
27+
28+
const result = await executionPromise;
29+
30+
expect(result.getResult()).toEqual({
31+
callbackResult: "{}",
32+
success: true,
33+
});
34+
expect(result.getInvocations().length).toBe(1);
35+
});
36+
},
37+
});
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
import {
2+
DurableContext,
3+
withDurableExecution,
4+
} from "@aws/durable-execution-sdk-js";
5+
import { ExampleConfig } from "../../../types";
6+
7+
export const config: ExampleConfig = {
8+
name: "Wait for Callback - Quick Completion",
9+
description:
10+
"Demonstrates waitForCallback invocation-level completion scenario",
11+
};
12+
13+
export const handler = withDurableExecution(
14+
async (event: { submitterDelay: number }, context: DurableContext) => {
15+
const result = await context.waitForCallback(async () => {
16+
if (event.submitterDelay) {
17+
await new Promise((resolve) =>
18+
setTimeout(resolve, event.submitterDelay * 1000),
19+
);
20+
}
21+
return Promise.resolve();
22+
});
23+
24+
return {
25+
callbackResult: result,
26+
success: true,
27+
};
28+
},
29+
);

0 commit comments

Comments
 (0)