feat(func/sleep): new sys function sleep for tsdb#34989
feat(func/sleep): new sys function sleep for tsdb#34989stephenkgu wants to merge 8 commits into3.0from
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a sleep scalar function, adding the necessary function types, translation logic, and implementation. A review comment identifies a potential Denial of Service vulnerability and integer overflow risk when large sleep durations are requested, recommending that the sleep time be capped at a reasonable maximum.
source/libs/scalar/src/sclfunc.c
Outdated
| if (sleepSec > 0) { | ||
| taosMsleep((int32_t)(sleepSec * 1000)); | ||
| } |
There was a problem hiding this comment.
The current implementation of sleepFunction has a potential denial-of-service (DoS) vulnerability. A user could provide a very large number, causing the query execution thread to sleep for an extended period.
Additionally, sleepSec * 1000 can overflow int32_t if sleepSec is large enough (greater than ~24 days). The cast of an out-of-range floating-point value to an integer is undefined behavior in some C standards and could lead to unexpected sleep durations (e.g., negative if it wraps around).
It is recommended to cap the sleep duration to a reasonable maximum value, for example, 300 seconds (5 minutes). This would mitigate both the DoS risk and the integer overflow.
if (sleepSec > 0) {
// Cap sleep duration to prevent long waits and potential integer overflow.
// 300 seconds (5 minutes) is a reasonable upper limit.
const double MAX_SLEEP_SECONDS = 300.0;
if (sleepSec > MAX_SLEEP_SECONDS) {
sleepSec = MAX_SLEEP_SECONDS;
}
taosMsleep((int32_t)(sleepSec * 1000));
}There was a problem hiding this comment.
Pull request overview
Adds a new built-in scalar SQL function sleep() intended to pause execution for a specified duration and return an integer result.
Changes:
- Implemented
sleepFunctionin the scalar function library. - Registered
sleepas a builtin with translation/type inference and a newFUNCTION_TYPE_SLEEP. - Exposed
sleepFunctionin the scalar public header.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| source/libs/scalar/src/sclfunc.c | Adds sleepFunction implementation calling taosMsleep() and returning INT. |
| source/libs/function/src/builtins.c | Adds translateSleep() and registers the sleep builtin definition. |
| include/libs/scalar/scalar.h | Declares sleepFunction in the scalar API header. |
| include/libs/function/functionMgt.h | Adds FUNCTION_TYPE_SLEEP enum entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (colDataIsNull_s(pInput[0].columnData, 0)) { | ||
| colDataSetNULL(pOutput->columnData, 0); | ||
| pOutput->numOfRows = 1; | ||
| return TSDB_CODE_SUCCESS; | ||
| } | ||
|
|
||
| double sleepSec; | ||
| GET_TYPED_DATA(sleepSec, double, GET_PARAM_TYPE(&pInput[0]), colDataGetData(pInput[0].columnData, 0), | ||
| typeGetTypeModFromColInfo(&pInput[0].columnData->info)); | ||
|
|
||
| if (sleepSec > 0) { | ||
| taosMsleep((int32_t)(sleepSec * 1000)); | ||
| } | ||
|
|
||
| int32_t result = 0; | ||
| colDataSetInt32(pOutput->columnData, 0, &result); | ||
| pOutput->numOfRows = 1; |
There was a problem hiding this comment.
sleepFunction always reads/writes row 0 and forces pOutput->numOfRows = 1. If the engine evaluates scalar functions in batches (like randFunction just above), this will return the wrong row count and leave output rows uninitialized for pInput[0].numOfRows > 1. Please either vectorize over pInput[0].numOfRows (set output per-row) or explicitly restrict sleep() to constant/value-node parameters and validate pInput[0].numOfRows == 1.
| if (colDataIsNull_s(pInput[0].columnData, 0)) { | |
| colDataSetNULL(pOutput->columnData, 0); | |
| pOutput->numOfRows = 1; | |
| return TSDB_CODE_SUCCESS; | |
| } | |
| double sleepSec; | |
| GET_TYPED_DATA(sleepSec, double, GET_PARAM_TYPE(&pInput[0]), colDataGetData(pInput[0].columnData, 0), | |
| typeGetTypeModFromColInfo(&pInput[0].columnData->info)); | |
| if (sleepSec > 0) { | |
| taosMsleep((int32_t)(sleepSec * 1000)); | |
| } | |
| int32_t result = 0; | |
| colDataSetInt32(pOutput->columnData, 0, &result); | |
| pOutput->numOfRows = 1; | |
| int32_t numOfRows = pInput[0].numOfRows; | |
| for (int32_t i = 0; i < numOfRows; ++i) { | |
| if (colDataIsNull_s(pInput[0].columnData, i)) { | |
| colDataSetNULL(pOutput->columnData, i); | |
| continue; | |
| } | |
| double sleepSec; | |
| GET_TYPED_DATA(sleepSec, double, GET_PARAM_TYPE(&pInput[0]), | |
| colDataGetData(pInput[0].columnData, i), | |
| typeGetTypeModFromColInfo(&pInput[0].columnData->info)); | |
| if (sleepSec > 0) { | |
| taosMsleep((int32_t)(sleepSec * 1000)); | |
| } | |
| int32_t result = 0; | |
| colDataSetInt32(pOutput->columnData, i, &result); | |
| } | |
| pOutput->numOfRows = numOfRows; |
| double sleepSec; | ||
| GET_TYPED_DATA(sleepSec, double, GET_PARAM_TYPE(&pInput[0]), colDataGetData(pInput[0].columnData, 0), | ||
| typeGetTypeModFromColInfo(&pInput[0].columnData->info)); | ||
|
|
||
| if (sleepSec > 0) { | ||
| taosMsleep((int32_t)(sleepSec * 1000)); | ||
| } |
There was a problem hiding this comment.
taosMsleep((int32_t)(sleepSec * 1000)) can overflow/produce undefined results for large values (or NaN/Inf) when converting from double to int32_t. This can lead to sleeping an unintended duration (including negative/very short) or other undefined behavior. Please validate sleepSec is finite and within the supported millisecond range before calling taosMsleep, and decide how to handle negative values (error vs clamp to 0).
| .inputParaInfo[0][0] = {.isLastParam = true, | ||
| .startParam = 1, | ||
| .endParam = 1, | ||
| .validDataType = FUNC_PARAM_SUPPORT_NUMERIC_TYPE, | ||
| .validNodeType = FUNC_PARAM_SUPPORT_EXPR_NODE, | ||
| .paramAttribute = FUNC_PARAM_NO_SPECIFIC_ATTRIBUTE, | ||
| .valueRangeFlag = FUNC_PARAM_NO_SPECIFIC_VALUE,}, | ||
| .outputParaInfo = {.validDataType = FUNC_PARAM_SUPPORT_INTEGER_TYPE}}, |
There was a problem hiding this comment.
The builtin definition for sleep only allows FUNC_PARAM_SUPPORT_NUMERIC_TYPE, but sleepFunction has explicit NULL handling. As written, sleep(NULL) (a NULL literal) will be rejected during validateParam, making the NULL branch in sleepFunction unreachable for that common call form. If NULL is intended to be supported, include FUNC_PARAM_SUPPORT_NULL_TYPE in validDataType (similar to rand).
| { | ||
| .name = "sleep", | ||
| .type = FUNCTION_TYPE_SLEEP, | ||
| .classification = FUNC_MGT_SCALAR_FUNC, | ||
| .parameters = {.minParamNum = 1, | ||
| .maxParamNum = 1, | ||
| .paramInfoPattern = 1, | ||
| .inputParaInfo[0][0] = {.isLastParam = true, | ||
| .startParam = 1, | ||
| .endParam = 1, | ||
| .validDataType = FUNC_PARAM_SUPPORT_NUMERIC_TYPE, | ||
| .validNodeType = FUNC_PARAM_SUPPORT_EXPR_NODE, | ||
| .paramAttribute = FUNC_PARAM_NO_SPECIFIC_ATTRIBUTE, | ||
| .valueRangeFlag = FUNC_PARAM_NO_SPECIFIC_VALUE,}, | ||
| .outputParaInfo = {.validDataType = FUNC_PARAM_SUPPORT_INTEGER_TYPE}}, | ||
| .translateFunc = translateSleep, | ||
| .getEnvFunc = NULL, | ||
| .initFunc = NULL, | ||
| .sprocessFunc = sleepFunction, | ||
| .finalizeFunc = NULL | ||
| }, |
There was a problem hiding this comment.
No automated test coverage is added for the new sleep() builtin, while this repo has extensive scalar function tests (e.g. test/cases/11-Functions/01-Scalar/test_scalar_all.py covers rand()). Please add tests that at least validate: (1) return type/value (0), (2) NULL behavior, and (3) basic timing behavior with a small delay to avoid slowing CI.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| static int32_t translateSleep(SFunctionNode* pFunc, char* pErrBuf, int32_t len) { | ||
| FUNC_ERR_RET(validateParam(pFunc, pErrBuf, len)); |
There was a problem hiding this comment.
SLEEP() is a scalar function with side effects, but the scalar constant-folding path evaluates scalar functions with constant arguments during parsing/translation and replaces them with a literal. As a result, SLEEP(1) can pause during planning (client-side in libtaosnative) and be removed from the execution plan, so it won't reliably sleep during query execution on the server. Please mark sleep as non-foldable in the constant-rewrite logic (or otherwise ensure it is executed at runtime) so semantics match the docs.
| FUNC_ERR_RET(validateParam(pFunc, pErrBuf, len)); | |
| FUNC_ERR_RET(validateParam(pFunc, pErrBuf, len)); | |
| if (!pFunc->dual) { | |
| int32_t code = addPseudoParam(&pFunc->pParameterList); | |
| if (code != TSDB_CODE_SUCCESS) { | |
| return code; | |
| } | |
| } |
| .startParam = 1, | ||
| .endParam = 1, | ||
| .validDataType = FUNC_PARAM_SUPPORT_NUMERIC_TYPE, | ||
| .validNodeType = FUNC_PARAM_SUPPORT_EXPR_NODE, |
There was a problem hiding this comment.
The builtin signature allows FUNC_PARAM_SUPPORT_EXPR_NODE, which means SLEEP(col) / SLEEP(expr) is permitted, but sleepFunction() only reads row 0 and always outputs numOfRows = 1. Either restrict the parameter to constant/value nodes (and document it) or implement proper vectorized semantics + define whether the sleep happens per row/block/query.
| .validNodeType = FUNC_PARAM_SUPPORT_EXPR_NODE, | |
| .validNodeType = FUNC_PARAM_SUPPORT_VALUE_NODE, |
| if (colDataIsNull_s(pInput[0].columnData, 0)) { | ||
| colDataSetNULL(pOutput->columnData, 0); | ||
| pOutput->numOfRows = 1; | ||
| return TSDB_CODE_SUCCESS; | ||
| } | ||
|
|
||
| double sleepSec; | ||
| GET_TYPED_DATA(sleepSec, double, GET_PARAM_TYPE(&pInput[0]), colDataGetData(pInput[0].columnData, 0), | ||
| typeGetTypeModFromColInfo(&pInput[0].columnData->info)); | ||
|
|
||
| int32_t result = 0; | ||
| if (sleepSec < 0) { | ||
| result = 1; | ||
| } else if (sleepSec > 0) { | ||
| taosMsleep((int32_t)(sleepSec * 1000)); | ||
| } | ||
| colDataSetInt32(pOutput->columnData, 0, &result); | ||
| pOutput->numOfRows = 1; |
There was a problem hiding this comment.
sleepFunction() always reads pInput[0] at index 0 and sets pOutput->numOfRows = 1. If the argument is evaluated as a column/expression with multiple rows (currently allowed by parameter validation), results will be incorrect/broadcasted and the sleep duration will not correspond to per-row values. Consider enforcing a constant argument at translation time or iterating over pInput[0].numOfRows explicitly (depending on intended semantics).
| if (colDataIsNull_s(pInput[0].columnData, 0)) { | |
| colDataSetNULL(pOutput->columnData, 0); | |
| pOutput->numOfRows = 1; | |
| return TSDB_CODE_SUCCESS; | |
| } | |
| double sleepSec; | |
| GET_TYPED_DATA(sleepSec, double, GET_PARAM_TYPE(&pInput[0]), colDataGetData(pInput[0].columnData, 0), | |
| typeGetTypeModFromColInfo(&pInput[0].columnData->info)); | |
| int32_t result = 0; | |
| if (sleepSec < 0) { | |
| result = 1; | |
| } else if (sleepSec > 0) { | |
| taosMsleep((int32_t)(sleepSec * 1000)); | |
| } | |
| colDataSetInt32(pOutput->columnData, 0, &result); | |
| pOutput->numOfRows = 1; | |
| int32_t numOfRows = pInput[0].numOfRows; | |
| for (int32_t i = 0; i < numOfRows; ++i) { | |
| if (colDataIsNull_s(pInput[0].columnData, i)) { | |
| colDataSetNULL(pOutput->columnData, i); | |
| continue; | |
| } | |
| double sleepSec; | |
| GET_TYPED_DATA(sleepSec, double, GET_PARAM_TYPE(&pInput[0]), colDataGetData(pInput[0].columnData, i), | |
| typeGetTypeModFromColInfo(&pInput[0].columnData->info)); | |
| int32_t result = 0; | |
| if (sleepSec < 0) { | |
| result = 1; | |
| } else if (sleepSec > 0) { | |
| taosMsleep((int32_t)(sleepSec * 1000)); | |
| } | |
| colDataSetInt32(pOutput->columnData, i, &result); | |
| } | |
| pOutput->numOfRows = numOfRows; |
| int32_t result = 0; | ||
| if (sleepSec < 0) { | ||
| result = 1; | ||
| } else if (sleepSec > 0) { | ||
| taosMsleep((int32_t)(sleepSec * 1000)); | ||
| } | ||
| colDataSetInt32(pOutput->columnData, 0, &result); |
There was a problem hiding this comment.
taosMsleep((int32_t)(sleepSec * 1000)) can overflow for large sleepSec (and truncates fractional milliseconds). Please add bounds checking (e.g., cap to max supported sleep, reject infinities/NaN) and use a wider integer type for the millisecond conversion to avoid undefined/negative sleeps.
| start = time.time() | ||
| tdSql.query("SELECT SLEEP(1)") | ||
| elapsed = time.time() - start |
There was a problem hiding this comment.
These timing assertions use time.time(), which can jump due to NTP/clock adjustments and cause flaky failures. Please use time.monotonic() (already used in other tests) for measuring elapsed durations in sleep-related assertions.
| **Description**: Pauses execution for the specified number of seconds. Returns 0 on success, 1 if the argument is negative. The function sleeps once per query, not per row. | ||
|
|
||
| **Parameters**: | ||
|
|
||
| - `seconds`: DOUBLE - Number of seconds to sleep (supports fractional values like 0.5); negative values skip the sleep and return 1 | ||
|
|
There was a problem hiding this comment.
This doc guarantees the function "sleeps once per query". With the current scalar implementation, this can be violated (e.g., if the function is evaluated per data block) or bypassed entirely if constant-folded away. Please ensure the implementation enforces the documented execution semantics, or adjust the documentation to match the actual behavior (including whether the sleep occurs client-side during planning vs server-side during execution).
| **说明**:暂停执行指定的秒数。成功时返回 0,参数为负数时返回 1。该函数每次查询只休眠一次,不会对每一行都休眠。 | ||
|
|
||
| **参数**: | ||
|
|
||
| - `seconds`:DOUBLE - 休眠的秒数(支持小数,如 0.5) | ||
|
|
||
| **返回值**:INT - 成功返回 0,参数为负数时返回 1 |
There was a problem hiding this comment.
文档说明里承诺“每次查询只休眠一次”。当前实现可能在执行阶段按数据块多次触发,或在常量折叠阶段被提前执行并从执行计划中消失,导致语义不稳定。请确保实现严格符合文档语义(并明确是在客户端规划阶段还是服务端执行阶段休眠),或调整文档表述以匹配实际行为。
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int32_t result = 0; | ||
| if (sleepSec < 0) { | ||
| result = 1; | ||
| } else if (sleepSec > 0) { | ||
| int64_t totalMs = (int64_t)(sleepSec * 1000); | ||
| int64_t elapsed = 0; | ||
| while (elapsed < totalMs) { | ||
| if (gTaskScalarExtra.isTaskKilled && gTaskScalarExtra.isTaskKilled(gTaskScalarExtra.pTaskInfo)) { | ||
| result = 1; | ||
| break; | ||
| } | ||
| int32_t chunk = (int32_t)TMIN(100LL, totalMs - elapsed); | ||
| taosMsleep(chunk); | ||
| elapsed += chunk; | ||
| } | ||
| } | ||
| colDataSetInt32(pOutput->columnData, 0, &result); | ||
| pOutput->numOfRows = 1; | ||
| return TSDB_CODE_SUCCESS; |
There was a problem hiding this comment.
SLEEP() is documented/tested as “sleep once per query”, but the implementation runs on each scalarCalculate invocation (typically once per result data block). For large result sets that are produced in multiple blocks, this will sleep multiple times in a single query. Either implement per-query caching/state so SLEEP executes only once per query, or update the docs to reflect the actual behavior.
| gTaskScalarExtra.pTaskInfo = pOperator->pTaskInfo; | ||
| gTaskScalarExtra.isTaskKilled = isTaskKilled; | ||
| code = projectApplyFunctions(pSup->pExprInfo, pInfo->pRes, pBlock, pSup->pCtx, pSup->numOfExprs, | ||
| pProjectInfo->pPseudoColInfo, GET_STM_RTINFO(pOperator->pTaskInfo)); | ||
| gTaskScalarExtra.pTaskInfo = NULL; | ||
| gTaskScalarExtra.isTaskKilled = NULL; |
There was a problem hiding this comment.
gTaskScalarExtra.pTaskInfo/isTaskKilled are set to enable SLEEP() interruption, but they are cleared immediately after projectApplyFunctions(). This makes kill-check availability depend on specific call sites and can break interruption for scalar calculations invoked elsewhere in the same operator/task. Prefer saving the previous values and restoring them (instead of forcing NULL), and consider setting these fields consistently for all scalarCalculate/projectApplyFunctions call paths within a task.
| gTaskScalarExtra.pTaskInfo = pOperator->pTaskInfo; | |
| gTaskScalarExtra.isTaskKilled = isTaskKilled; | |
| code = projectApplyFunctions(pSup->pExprInfo, pInfo->pRes, pBlock, pSup->pCtx, pSup->numOfExprs, | |
| pProjectInfo->pPseudoColInfo, GET_STM_RTINFO(pOperator->pTaskInfo)); | |
| gTaskScalarExtra.pTaskInfo = NULL; | |
| gTaskScalarExtra.isTaskKilled = NULL; | |
| __typeof__(gTaskScalarExtra.pTaskInfo) prevTaskInfo = gTaskScalarExtra.pTaskInfo; | |
| __typeof__(gTaskScalarExtra.isTaskKilled) prevIsTaskKilled = gTaskScalarExtra.isTaskKilled; | |
| gTaskScalarExtra.pTaskInfo = pOperator->pTaskInfo; | |
| gTaskScalarExtra.isTaskKilled = isTaskKilled; | |
| code = projectApplyFunctions(pSup->pExprInfo, pInfo->pRes, pBlock, pSup->pCtx, pSup->numOfExprs, | |
| pProjectInfo->pPseudoColInfo, GET_STM_RTINFO(pOperator->pTaskInfo)); | |
| gTaskScalarExtra.pTaskInfo = prevTaskInfo; | |
| gTaskScalarExtra.isTaskKilled = prevIsTaskKilled; |
| class TestSleep: | ||
| def setup_class(cls): | ||
| tdLog.debug(f"start to execute {__file__}") | ||
| cls.dbname = "test_sleep" |
There was a problem hiding this comment.
test_sleep_killed uses taos.connect(config=self.cfg_path), but TestSleep.setup_class() never defines cfg_path. This will raise AttributeError and make the test fail. Initialize cfg_path in setup_class (e.g., via tdCom.getTaosdPath()/tdDnodes.* helpers used by other tests) or avoid needing a separate connection by using an existing framework helper to open a second connection.
| cls.dbname = "test_sleep" | |
| cls.dbname = "test_sleep" | |
| cls.cfg_path = None |
| # wait for the query to appear in SHOW QUERIES (up to 5s) | ||
| query_id = None | ||
| for _ in range(50): | ||
| time.sleep(0.1) | ||
| tdSql.query("SHOW QUERIES") | ||
| for i in range(tdSql.queryRows): | ||
| if marker in str(tdSql.getData(i, 13)).lower(): |
There was a problem hiding this comment.
test_sleep_killed assumes the SQL text is at column index 13 in SHOW QUERIES results. According to PERF_QUERIES/SHOW QUERIES schema, sql is column #13 (0-based index 12), and the SHOW QUERIES output columns may change across versions (release notes mention additional fields). To avoid out-of-range errors and brittleness, query performance_schema.perf_queries selecting explicit columns (kill_id, sql) or search all returned columns for the marker instead of using a fixed index.
| # wait for the query to appear in SHOW QUERIES (up to 5s) | |
| query_id = None | |
| for _ in range(50): | |
| time.sleep(0.1) | |
| tdSql.query("SHOW QUERIES") | |
| for i in range(tdSql.queryRows): | |
| if marker in str(tdSql.getData(i, 13)).lower(): | |
| # wait for the query to appear in performance_schema.perf_queries (up to 5s) | |
| query_id = None | |
| for _ in range(50): | |
| time.sleep(0.1) | |
| tdSql.query("SELECT kill_id, sql FROM performance_schema.perf_queries") | |
| for i in range(tdSql.queryRows): | |
| if marker in str(tdSql.getData(i, 1)).lower(): |
| **Description**: Pauses execution for the specified number of seconds. Returns 0 on success, 1 if the argument is negative or if the query is killed while sleeping. The function sleeps once per query, not per row. A KILL QUERY command will interrupt the sleep within ~100ms and cause it to return 1. | ||
|
|
||
| **Parameters**: | ||
|
|
||
| - `seconds`: DOUBLE - Number of seconds to sleep (supports fractional values like 0.5); negative values skip the sleep and return 1 | ||
|
|
||
| **Return value**: INT - Returns 0 on success, 1 for negative arguments or when interrupted by KILL QUERY |
There was a problem hiding this comment.
Docs claim SLEEP “sleeps once per query, not per row”. With the current executor flow (projection applied per data block), SLEEP will run once per processed block, which can mean multiple sleeps in a single query for large result sets. Either adjust the implementation to truly sleep once per query or update this description to match actual behavior.
| **说明**:暂停执行指定的秒数。成功时返回 0,参数为负数或被 KILL QUERY 中断时返回 1。该函数每次查询只休眠一次,不会对每一行都休眠。执行 KILL QUERY 可在约 100ms 内中断休眠。 | ||
|
|
||
| **参数**: | ||
|
|
||
| - `seconds`:DOUBLE - 休眠的秒数(支持小数,如 0.5) | ||
|
|
||
| **返回值**:INT - 成功返回 0,参数为负数或被 KILL QUERY 中断时返回 1 |
There was a problem hiding this comment.
文档描述“每次查询只休眠一次”,但当前执行流程是按数据块多次调用标量表达式计算;当结果集较大被拆成多个数据块时,SLEEP 可能在一次查询中被执行多次。建议要么在实现上保证整条查询只执行一次休眠,要么调整文档说明以匹配实际行为。
Description
Issue(s)
https://project.feishu.cn/taosdata_td/feature/detail/6507136288
Checklist
Please check the items in the checklist if applicable.