-
-
Notifications
You must be signed in to change notification settings - Fork 169
Paging-capable CRDT / Partial-loading data type for large sparse/dense maps #1684
Description
What would you like to be added:
Background / Motivation
In spreadsheet-like apps (e.g., Wafflebase), a single Yorkie document stores many small entries (cells) in a JSON-like object map. Even with batching, we still create one CRDT entry per populated cell, which makes AttachDocument payload and snapshot size grow linearly with the number of cells.
Example:
- A1:Z100 = 2,600 cells
- Raw JSON equivalent: ~46 KB
- Observed Yorkie attach payload: ~468 KB (≈ ~166 bytes overhead per cell)
The overhead is not mainly due to “too many update calls”, but due to per-entry CRDT structural/metadata costs and full-document materialization on attach.
Yorkie currently synchronizes and watches at document granularity:
- Attach loads the whole document state (snapshot + changes via PushPull)
- WatchDocument delivers events for the whole doc
For very large “table/grid” workloads, clients often need only a viewport subset. Without partial loading:
- Initial attach becomes slow/heavy
- Memory usage increases (materializing the full doc)
- Network usage increases (transferring unused regions)
- Server fanout costs increase (watch events for irrelevant regions)
Goal
Introduce a paging-capable CRDT/data type (or a partial-loading mechanism) that allows clients to:
- Load only required segments/pages (rows/tiles/ranges)
- Subscribe only to those segments
- Apply updates that affect only specific segments while preserving CRDT correctness and collaboration semantics.
Non-goals (initially)
- Perfect minimal diffing across arbitrary nested JSON paths
- Arbitrary user-defined query languages
- Replacing existing Object/Array semantics for general documents
Why is this needed:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status