Replies: 1 comment 3 replies
-
I'm not sure I understand this point. Are you saying that you do one commit per fragment? What operation is Ray using to do the commit? Is it using |
Beta Was this translation helpful? Give feedback.
-
I'm not sure I understand this point. Are you saying that you do one commit per fragment? What operation is Ray using to do the commit? Is it using |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Background & Problem
In large‑scale data production pipelines for LLM training, we use Ray to write many columns into a Lance dataset in parallel. Each Ray worker is exclusively responsible for appending data to a distinct column, so there is no data‑level write conflict – only metadata (manifest) conflict.
With the current single‑manifest design we face a dilemma:
Both paths limit the throughput and scalability that the underlying columnar storage format could otherwise deliver.
Proposed Solution: Two‑Tier Manifest
Split the global manifest into two layers:
Expected Benefits
Beta Was this translation helpful? Give feedback.
All reactions