Skip to content

Paging-capable CRDT / Partial-loading data type for large sparse/dense maps #1684

@hackerwins

Description

@hackerwins

What would you like to be added:

Background / Motivation

In spreadsheet-like apps (e.g., Wafflebase), a single Yorkie document stores many small entries (cells) in a JSON-like object map. Even with batching, we still create one CRDT entry per populated cell, which makes AttachDocument payload and snapshot size grow linearly with the number of cells.

Example:

  • A1:Z100 = 2,600 cells
  • Raw JSON equivalent: ~46 KB
  • Observed Yorkie attach payload: ~468 KB (≈ ~166 bytes overhead per cell)

The overhead is not mainly due to “too many update calls”, but due to per-entry CRDT structural/metadata costs and full-document materialization on attach.

Yorkie currently synchronizes and watches at document granularity:

  • Attach loads the whole document state (snapshot + changes via PushPull)
  • WatchDocument delivers events for the whole doc

For very large “table/grid” workloads, clients often need only a viewport subset. Without partial loading:

  • Initial attach becomes slow/heavy
  • Memory usage increases (materializing the full doc)
  • Network usage increases (transferring unused regions)
  • Server fanout costs increase (watch events for irrelevant regions)

Goal

Introduce a paging-capable CRDT/data type (or a partial-loading mechanism) that allows clients to:

  • Load only required segments/pages (rows/tiles/ranges)
  • Subscribe only to those segments
  • Apply updates that affect only specific segments while preserving CRDT correctness and collaboration semantics.

Non-goals (initially)

  • Perfect minimal diffing across arbitrary nested JSON paths
  • Arbitrary user-defined query languages
  • Replacing existing Object/Array semantics for general documents

Why is this needed:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions