Skip to content

[taosdump] performance degradation in TDengine Client 3.4.0.9 and taosdump 3.4.0.9 #35002

@lucimurata

Description

@lucimurata

Performance Issue
taosdump export performance regression introduced in PR #33706 — backup takes 8+ hours and fails before completion, which used to take 1 hour in taosdump 3.3.6.0 version

Problem Description
After upgrading TDengine to 3.4.0.9, taosdump export (dump out) became extremely slow. A backup that previously completed in approximately 1 hour now runs for 8+ hours and ultimately fails due to connection drop before completing.

Root cause: it seems a change introduced in PR #33706 (feat(taosdump): supports retry mechanism when dumping out data, by @YamingPei, 2025-11-26) modified the LIMIT/OFFSET pagination logic in the writeResultToAvro function (tools/taos-tools/src/taosdump.c). This change causes taosdump to issue ~397 queries per child table instead of 1 query per child table, with progressively degrading performance due to high OFFSET values. Looks like taosdump degradation starts at 4601/7136 subtable.

What changed in PR #33706

Change 1 — Limit condition (line 3638 in ver-3.4.0.9):

// Before PR #33706:
if (limit < (queryCount - offset)) {
    limit = queryCount - offset;
}

// After PR #33706:
if (limit > (queryCount - offset)) {
    limit = queryCount - offset;
}

Change 2 — Offset increment (line 3717 in ver-3.4.0.9):

// Before PR #33706:
offset += limit;

// After PR #33706:
offset += countInBatch;

In the previous behavior, the < operator caused limit to be set to queryCount - offset (all remaining rows) on the first iteration. Combined with offset += limit, each child table was exported in a single query.

After PR #33706, the > operator keeps limit at data_batch (16383), and offset += countInBatch advances by actual rows returned. This means each child table now requires hundreds of queries with increasing OFFSET values.

To Reproduce

  1. Database parameters:

    • 1 super table (super_table) with 6 columns + 5 tags (including VARCHAR(1000), VARCHAR(450), VARCHAR(300))
    • 7,136 child tables in the super table (9,651 subtables total in the database)
    • Child tables contain ~5-7 million rows each
    • Total rows in database_example: 3.388.473.021
  2. Verb used: Select (taosdump export / dump out)

  3. Observed vs expected performance:

Metric Before PR #33706 After PR #33706 (ver-3.4.0.9)
Queries per child table 1 ~397
Total queries estimated ~7,136 ~2,831,000
Dump time ~1 hour 8+ hours (fails before completion)
Dump outcome Completes successfully Fails due to connection drop
  1. Command used:
taosdump -p*** -D database_example -h <host>:6030 -o <outpath> -B 163840 -T 10 -d snappy -k 3 -z 1000

Additionally, the maximum -B (data_batch) is hardcoded to 16383 (dump.h line 56: MAX_RECORDS_PER_REQ = 32766, then taosdump.c lines 826-827: data_batch = MAX_RECORDS_PER_REQ/2), so users cannot work around this by increasing batch size. Note: -B 163840 is silently truncated to 16383 due to the hardcoded limit.

**Screenshots **
I won't use logs screenshots to protect sensitive information

Log output start of taosdump process:

  |   | 2026-03-27 05:06:34.132 | build: Linux-x64 2026-02-27 23:32:40 +0800 |  
  |   | 2026-03-27 05:06:34.132 | host: host.example.svc.cluster.local:6030 |  
  |   | 2026-03-27 05:06:34.132 | user: root |  
  |   | 2026-03-27 05:06:34.132 | port: 0 |  
  |   | 2026-03-27 05:06:34.132 | outpath: 2026-03-27-T08-06-34-UTC/ |  
  |   | 2026-03-27 05:06:34.132 | inpath: |  
  |   | 2026-03-27 05:06:34.132 | resultFile: ./dump_result.txt |  
  |   | 2026-03-27 05:06:34.132 | all_databases: false |  
  |   | 2026-03-27 05:06:34.132 | databases: true |  
  |   | 2026-03-27 05:06:34.132 | databasesSeq: database_example|  
  |   | 2026-03-27 05:06:34.132 | schemaonly: false |  
  |   | 2026-03-27 05:06:34.132 | with_property: true |  
  |   | 2026-03-27 05:06:34.132 | avro codec: snappy |  
  |   | 2026-03-27 05:06:34.132 | data_batch: 16383 |  
  |   | 2026-03-27 05:06:34.132 | thread_num: 10 |  
  |   | 2026-03-27 05:06:34.132 | allow_sys: false |  
  |   | 2026-03-27 05:06:34.132 | escape_char: true |  
  |   | 2026-03-27 05:06:34.132 | loose_mode: false |  
  |   | 2026-03-27 05:06:34.132 | isDumpIn: false |  
  |   | 2026-03-27 05:06:34.132 | arg_list_len: 0

Log output from ver-3.4.0.9 showing high OFFSET values during dump:

INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_123` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6667881;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_456` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 5930646;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_789` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 7290435;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_101` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6979158;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_102` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6323838;

Progress log showing the dump is still running after hours at 59-76% on individual tables:

INFO: database_example.super_table [5313/7136] write avro 76% of tag_123
INFO: database_example.super_table [5313/7136] write avro 68% of tag_456
INFO: database_example.super_table [5313/7136] write avro 59% of tag_789

Eventual failure due to connection drop:

ERROR: Failed to connect to server host:6030, code: 0x8000000b, reason: Unable to establish connection!
ERROR: 0 database(s) valid to dump

Environment :

  • OS: Linux (Kubernetes pods, Ubuntu-based)
  • Memory: ~31,3 GB
  • Disk: ~496 GB
  • CPU: 16
  • TDengine Server Version: 3.4.0.9
  • taosdump Version: 3.4.0.9
  • Architecture: Linux-x64
  • Deployment: Kubernetes triggered by Airflow DAG

Additional Context

  • Using taosdump 3.3.6.0 or 3.3.6.9 client against server 3.4.0.9 is not possible due to protocol incompatibility (0x8000000b: Unable to establish connection).
  • The -B (data_batch) parameter is hardcoded to max 16383 (MAX_RECORDS_PER_REQ/2), so users have no way to increase batch size to mitigate the issue.
  • The previous behavior (fetching all rows in one query) was technically a bug in the limit condition, but the "fix" introduced a severe performance regression without providing an efficient pagination alternative.

Related

  • PR that introduced the regression: #33706feat(taosdump): supports retry mechanism when dumping out data (2025-11-26)
  • File: tools/taos-tools/src/taosdump.c, function writeResultToAvro (lines 3638, 3717)
  • Hardcoded limit: tools/taos-tools/inc/dump.h line 56

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions