[taosdump] performance degradation in TDengine Client 3.4.0.9 and taosdump 3.4.0.9

**Performance Issue** 
taosdump export performance regression introduced in PR #33706 — backup takes 8+ hours and fails before completion, which used to take 1 hour in taosdump 3.3.6.0 version

**Problem Description** 
After upgrading TDengine to 3.4.0.9, `taosdump` export (dump out) became extremely slow. A backup that previously completed in approximately **1 hour** now runs for **8+ hours** and ultimately fails due to connection drop before completing.

Root cause: it seems a change introduced in **PR #33706** (`feat(taosdump): supports retry mechanism when dumping out data`, by @YamingPei, 2025-11-26) modified the LIMIT/OFFSET pagination logic in the `writeResultToAvro` function (`tools/taos-tools/src/taosdump.c`). This change causes taosdump to issue **~397 queries per child table** instead of **1 query per child table**, with progressively degrading performance due to high OFFSET values. Looks like taosdump degradation starts at 4601/7136 subtable.

 **What changed in PR #33706**

**Change 1 — Limit condition (line 3638 in ver-3.4.0.9):**

```c
// Before PR #33706:
if (limit < (queryCount - offset)) {
    limit = queryCount - offset;
}

// After PR #33706:
if (limit > (queryCount - offset)) {
    limit = queryCount - offset;
}
```

**Change 2 — Offset increment (line 3717 in ver-3.4.0.9):**

```c
// Before PR #33706:
offset += limit;

// After PR #33706:
offset += countInBatch;
```

In the previous behavior, the `<` operator caused `limit` to be set to `queryCount - offset` (all remaining rows) on the first iteration. Combined with `offset += limit`, each child table was exported in **a single query**.

After PR #33706, the `>` operator keeps `limit` at `data_batch` (16383), and `offset += countInBatch` advances by actual rows returned. This means each child table now requires **hundreds of queries with increasing OFFSET values**.


**To Reproduce**
1. Database parameters:
   - 1 super table (`super_table`) with 6 columns + 5 tags (including VARCHAR(1000), VARCHAR(450), VARCHAR(300))
   - 7,136 child tables in the super table (9,651 subtables total in the database)
   - Child tables contain ~5-7 million rows each
   - Total rows in database_example: **3.388.473.021**

2. Verb used: **Select** (taosdump export / dump out)

3. Observed vs expected performance:

| Metric | Before PR #33706 | After PR #33706 (ver-3.4.0.9) |
|--------|-------------------|-------------------------------|
| Queries per child table | 1 | ~397 |
| Total queries estimated | ~7,136 | ~2,831,000 |
| Dump time | ~1 hour | 8+ hours (fails before completion) |
| Dump outcome | Completes successfully | Fails due to connection drop |

4. Command used:

```bash
taosdump -p*** -D database_example -h <host>:6030 -o <outpath> -B 163840 -T 10 -d snappy -k 3 -z 1000
```

Additionally, the maximum `-B` (data_batch) is **hardcoded to 16383** (`dump.h` line 56: `MAX_RECORDS_PER_REQ = 32766`, then `taosdump.c` lines 826-827: `data_batch = MAX_RECORDS_PER_REQ/2`), so users cannot work around this by increasing batch size. Note: `-B 163840` is silently truncated to 16383 due to the hardcoded limit.

**Screenshots **
**_I won't use logs screenshots to protect sensitive information_**

Log output start of taosdump process:
```
  |   | 2026-03-27 05:06:34.132 | build: Linux-x64 2026-02-27 23:32:40 +0800 |  
  |   | 2026-03-27 05:06:34.132 | host: host.example.svc.cluster.local:6030 |  
  |   | 2026-03-27 05:06:34.132 | user: root |  
  |   | 2026-03-27 05:06:34.132 | port: 0 |  
  |   | 2026-03-27 05:06:34.132 | outpath: 2026-03-27-T08-06-34-UTC/ |  
  |   | 2026-03-27 05:06:34.132 | inpath: |  
  |   | 2026-03-27 05:06:34.132 | resultFile: ./dump_result.txt |  
  |   | 2026-03-27 05:06:34.132 | all_databases: false |  
  |   | 2026-03-27 05:06:34.132 | databases: true |  
  |   | 2026-03-27 05:06:34.132 | databasesSeq: database_example|  
  |   | 2026-03-27 05:06:34.132 | schemaonly: false |  
  |   | 2026-03-27 05:06:34.132 | with_property: true |  
  |   | 2026-03-27 05:06:34.132 | avro codec: snappy |  
  |   | 2026-03-27 05:06:34.132 | data_batch: 16383 |  
  |   | 2026-03-27 05:06:34.132 | thread_num: 10 |  
  |   | 2026-03-27 05:06:34.132 | allow_sys: false |  
  |   | 2026-03-27 05:06:34.132 | escape_char: true |  
  |   | 2026-03-27 05:06:34.132 | loose_mode: false |  
  |   | 2026-03-27 05:06:34.132 | isDumpIn: false |  
  |   | 2026-03-27 05:06:34.132 | arg_list_len: 0

```

Log output from ver-3.4.0.9 showing high OFFSET values during dump:

```
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_123` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6667881;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_456` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 5930646;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_789` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 7290435;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_101` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6979158;
INFO: debug: taosQuery succeeded. sql=SELECT * FROM database_example.`tag_102` WHERE _c0 >= -9223372036854775806 AND _c0 <= 9223372036854775807 ORDER BY _c0 ASC LIMIT 16383 OFFSET 6323838;
```

Progress log showing the dump is still running after hours at 59-76% on individual tables:

```
INFO: database_example.super_table [5313/7136] write avro 76% of tag_123
INFO: database_example.super_table [5313/7136] write avro 68% of tag_456
INFO: database_example.super_table [5313/7136] write avro 59% of tag_789
```

Eventual failure due to connection drop:

```
ERROR: Failed to connect to server host:6030, code: 0x8000000b, reason: Unable to establish connection!
ERROR: 0 database(s) valid to dump
```

**Environment :**

- **OS:** Linux (Kubernetes pods, Ubuntu-based)
- **Memory:** ~31,3 GB 
- **Disk:** ~496 GB
- **CPU:** 16
- **TDengine Server Version:** 3.4.0.9
- **taosdump Version:** 3.4.0.9
- **Architecture:** Linux-x64
- **Deployment:** Kubernetes triggered by Airflow DAG

**Additional Context** 
- Using taosdump 3.3.6.0 or 3.3.6.9 client against server 3.4.0.9 is **not possible** due to protocol incompatibility (`0x8000000b: Unable to establish connection`).
- The `-B` (data_batch) parameter is hardcoded to max 16383 (`MAX_RECORDS_PER_REQ/2`), so users have no way to increase batch size to mitigate the issue.
- The previous behavior (fetching all rows in one query) was technically a bug in the limit condition, but the "fix" introduced a severe performance regression without providing an efficient pagination alternative.

### Related

- **PR that introduced the regression:** [#33706](https://github.com/taosdata/TDengine/pull/33706) — `feat(taosdump): supports retry mechanism when dumping out data` (2025-11-26)
- **File:** `tools/taos-tools/src/taosdump.c`, function `writeResultToAvro` (lines 3638, 3717)
- **Hardcoded limit:** `tools/taos-tools/inc/dump.h` line 56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[taosdump] performance degradation in TDengine Client 3.4.0.9 and taosdump 3.4.0.9 #35002

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Before PR #33706	After PR #33706 (ver-3.4.0.9)
Queries per child table	1	~397
Total queries estimated	~7,136	~2,831,000
Dump time	~1 hour	8+ hours (fails before completion)
Dump outcome	Completes successfully	Fails due to connection drop

[taosdump] performance degradation in TDengine Client 3.4.0.9 and taosdump 3.4.0.9 #35002

Description

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions