Skip to content

[Repo Assist] Fix Frame.writeParquet writing empty files (no row data)#716

Merged
dsyme merged 2 commits intomasterfrom
repo-assist/fix-issue-712-parquet-write-empty-b8abe83d472ad7ad
Apr 20, 2026
Merged

[Repo Assist] Fix Frame.writeParquet writing empty files (no row data)#716
dsyme merged 2 commits intomasterfrom
repo-assist/fix-issue-712-parquet-write-empty-b8abe83d472ad7ad

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This is an automated PR from Repo Assist.

Closes #712

Root Cause

Frame.writeParquet and Frame.writeParquetStream were silently writing empty files — only the column schema was persisted, but all row data was discarded.

The bug is a one-character fix in src/Deedle.Parquet/Parquet.fs:

// Before (broken) — row group never committed:
let rg = writer.CreateRowGroup()

// After (fixed) — Dispose() commits the row group to the stream:
use rg = writer.CreateRowGroup()
```

The `IParquetRowGroupWriter` returned by `ParquetWriter.CreateRowGroup()` implements `IDisposable`. Disposing it is what causes Parquet.Net to flush and commit the row group's data to the underlying stream. Using `let` instead of `use` meant `Dispose()` was never called, so the writer's internal buffers were discarded when GC collected them or in practice just never flushed. The schema header was written by the `ParquetWriter` itself, which is why column names appeared but no rows.

The same bug affected `writeParquetStream`.

## Changes

| File | Change |
|---|---|
| `src/Deedle.Parquet/Parquet.fs` | `let rg` → `use rg` in `writeParquet` and `writeParquetStream` |
| `tests/Deedle.Parquet.Tests/data/weather.parquet` | New: meteorological sample (8 rows × 7 cols, PyArrow-generated) |
| `tests/Deedle.Parquet.Tests/data/trades.parquet` | New: financial trades sample (20 rows × 7 cols, PyArrow-generated) |
| `tests/Deedle.Parquet.Tests/data/sensors.parquet` | New: IoT sensor log (10 rows × 7 cols, PyArrow-generated) |
| `tests/Deedle.Parquet.Tests/Tests.fs` | 20 new tests for the three sample files |
| `RELEASE_NOTES.md` | Bug-fix entry under new 6.0.2 section |

### New sample files

The three new `.parquet` files were generated with PyArrow 23, providing cross-tool coverage (Deedle reads files written by a different Parquet library):

- **`weather.parquet`**  `string`, `DateTime`, `float64`, `int32`, `float32`, `float64` (with nulls), `bool`
- **`trades.parquet`**  `int32`, `string`, `float64`, `int64`, `int16`, `DateTime`, `bool`
- **`sensors.parquet`**  `int32`, `float32` (nulls), `float64` (nulls), `int32` (nulls), `int64` (nulls), `string` (nulls), `bool` (nulls)  exercises every nullable path

## Test Status

```
Passed! - Failed: 0, Passed: 55, Skipped: 0, Total: 55

All 55 tests pass (35 pre-existing + 20 new).

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@96b9d4c39aa22359c0b38265927eadb31dcf4e2a

Root cause: IParquetRowGroupWriter was bound with 'let rg' instead of
'use rg', so Dispose() was never called and the row group was never
committed to the Parquet stream. The schema (column names/types) was
written, but all row data was silently discarded.

Fix: change 'let rg = writer.CreateRowGroup()' to
     'use rg = writer.CreateRowGroup()' in both writeParquet and
     writeParquetStream.

Also:
- Add 3 real-world PyArrow-generated sample parquet files:
    weather.parquet  (meteorological readings, 8 rows x 7 cols,
                      float32/64/int32/bool/string/DateTime, 2 null cols)
    trades.parquet   (financial trade records, 20 rows x 7 cols,
                      int16/int32/int64/float64/bool/string/DateTime)
    sensors.parquet  (IoT sensor log, 10 rows x 7 cols,
                      float32/64/int32/int64/bool/string, all with nulls)
- Add 20 new tests for the three new sample files (column shape, value
  correctness, missing-value counts, and round-trip for each file)

Closes #712

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dsyme dsyme marked this pull request as ready for review April 20, 2026 03:28
@dsyme dsyme merged commit 313c34e into master Apr 20, 2026
2 checks passed
@dsyme dsyme deleted the repo-assist/fix-issue-712-parquet-write-empty-b8abe83d472ad7ad branch April 20, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data is not written to parquet file. Only column names are written

1 participant