-
Notifications
You must be signed in to change notification settings - Fork 5k
[taosdump] performance degradation in TDengine Client 3.4.0.9 and taosdump 3.4.0.9 #35002
Description
Performance Issue
taosdump export performance regression introduced in PR #33706 — backup takes 8+ hours and fails before completion, which used to take 1 hour in taosdump 3.3.6.0 version
Problem Description
After upgrading TDengine to 3.4.0.9, taosdump export (dump out) became extremely slow. A backup that previously completed in approximately 1 hour now runs for 8+ hours and ultimately fails due to connection drop before completing.
Root cause: it seems a change introduced in PR #33706 (feat(taosdump): supports retry mechanism when dumping out data, by @YamingPei, 2025-11-26) modified the LIMIT/OFFSET pagination logic in the writeResultToAvro function (tools/taos-tools/src/taosdump.c). This change causes taosdump to issue ~397 queries per child table instead of 1 query per child table, with progressively degrading performance due to high OFFSET values. Looks like taosdump degradation starts at 4601/7136 subtable.
What changed in PR #33706
Change 1 — Limit condition (line 3638 in ver-3.4.0.9):
// Before PR #33706:
if (limit < (queryCount - offset)) {
limit = queryCount - offset;
}
// After PR #33706:
if (limit > (queryCount - offset)) {
limit = queryCount - offset;
}Change 2 — Offset increment (line 3717 in ver-3.4.0.9):
// Before PR #33706:
offset += limit;
// After PR #33706:
offset += countInBatch;In the previous behavior, the < operator caused limit to be set to queryCount - offset (all remaining rows) on the first iteration. Combined with offset += limit, each child table was exported in a single query.
After PR #33706, the > operator keeps limit at data_batch (16383), and offset += countInBatch advances by actual rows returned. This means each child table now requires hundreds of queries with increasing OFFSET values.
To Reproduce
-
Database parameters:
- 1 super table (
super_table) with 6 columns + 5 tags (including VARCHAR(1000), VARCHAR(450), VARCHAR(300)) - 7,136 child tables in the super table (9,651 subtables total in the database)
- Child tables contain ~5-7 million rows each
- Total rows in database_example: 3.388.473.021
- 1 super table (
-
Verb used: Select (taosdump export / dump out)
-
Observed vs expected performance:
| Metric | Before PR #33706 | After PR #33706 (ver-3.4.0.9) |
|---|---|---|
| Queries per child table | 1 | ~397 |
| Total queries estimated | ~7,136 | ~2,831,000 |
| Dump time | ~1 hour | 8+ hours (fails before completion) |
| Dump outcome | Completes successfully | Fails due to connection drop |
- Command used:
taosdump -p*** -D database_example -h <host>:6030 -o <outpath> -B 163840 -T 10 -d snappy -k 3 -z 1000Additionally, the maximum -B (data_batch) is hardcoded to 16383 (dump.h line 56: MAX_RECORDS_PER_REQ = 32766, then taosdump.c lines 826-827: data_batch = MAX_RECORDS_PER_REQ/2), so users cannot work around this by increasing batch size. Note: -B 163840 is silently truncated to 16383 due to the hardcoded limit.
**Screenshots **
I won't use logs screenshots to protect sensitive information
Log output start of taosdump process:
| | 2026-03-27 05:06:34.132 | build: Linux-x64 2026-02-27 23:32:40 +0800 |
| | 2026-03-27 05:06:34.132 | host: host.example.svc.cluster.local:6030 |
| | 2026-03-27 05:06:34.132 | user: root |
| | 2026-03-27 05:06:34.132 | port: 0 |
| | 2026-03-27 05:06:34.132 | outpath: 2026-03-27-T08-06-34-UTC/ |
| | 2026-03-27 05:06:34.132 | inpath: |
| | 2026-03-27 05:06:34.132 | resultFile: ./dump_result.txt |
| | 2026-03-27 05:06:34.132 | all_databases: false |
| | 2026-03-27 05:06:34.132 | databases: true |
| | 2026-03-27 05:06:34.132 | databasesSeq: database_example|
| | 2026-03-27 05:06:34.132 | schemaonly: false |
| | 2026-03-27 05:06:34.132 | with_property: true |
| | 2026-03-27 05:06:34.132 | avro codec: snappy |
| | 2026-03-27 05:06:34.132 | data_batch: 16383 |
| | 2026-03-27 05:06:34.132 | thread_num: 10 |
| | 2026-03-27 05:06:34.132 | allow_sys: false |
| | 2026-03-27 05:06:34.132 | escape_char: true |
| | 2026-03-27 05:06:34.132 | loose_mode: false |
| | 2026-03-27 05:06:34.132 | isDumpIn: false |
| | 2026-03-27 05:06:34.132 | arg_list_len: 0
Log output from ver-3.4.0.9 showing high OFFSET values during dump:
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_123` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6667881;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_456` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 5930646;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_789` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 7290435;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_101` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6979158;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_102` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6323838;
Progress log showing the dump is still running after hours at 59-76% on individual tables:
INFO: database_example.super_table [5313/7136] write avro 76% of tag_123
INFO: database_example.super_table [5313/7136] write avro 68% of tag_456
INFO: database_example.super_table [5313/7136] write avro 59% of tag_789
Eventual failure due to connection drop:
ERROR: Failed to connect to server host:6030, code: 0x8000000b, reason: Unable to establish connection!
ERROR: 0 database(s) valid to dump
Environment :
- OS: Linux (Kubernetes pods, Ubuntu-based)
- Memory: ~31,3 GB
- Disk: ~496 GB
- CPU: 16
- TDengine Server Version: 3.4.0.9
- taosdump Version: 3.4.0.9
- Architecture: Linux-x64
- Deployment: Kubernetes triggered by Airflow DAG
Additional Context
- Using taosdump 3.3.6.0 or 3.3.6.9 client against server 3.4.0.9 is not possible due to protocol incompatibility (
0x8000000b: Unable to establish connection). - The
-B(data_batch) parameter is hardcoded to max 16383 (MAX_RECORDS_PER_REQ/2), so users have no way to increase batch size to mitigate the issue. - The previous behavior (fetching all rows in one query) was technically a bug in the limit condition, but the "fix" introduced a severe performance regression without providing an efficient pagination alternative.
Related
- PR that introduced the regression: #33706 —
feat(taosdump): supports retry mechanism when dumping out data(2025-11-26) - File:
tools/taos-tools/src/taosdump.c, functionwriteResultToAvro(lines 3638, 3717) - Hardcoded limit:
tools/taos-tools/inc/dump.hline 56