Skip to content

IEP-0002: Add meta payload size limit #24

@titusz

Description

@titusz

Problem

IEP-0002 requires the meta element to be wrapped in a Data-URL but defines no upper bound on decoded payload size. Neither iscc-core nor iscc-lib enforce any limit.

A multi-megabyte Data-URL payload is fully base64-decoded and processed through soft_hash_meta_v0 (byte n-gram generation + SimHash), causing excessive memory consumption. For a payload of size N, the function generates ~(N - 3) overlapping 4-byte n-grams, each hashed with BLAKE3, then processed through SimHash — O(N) memory and compute.

Realistic metadata payloads (JSON-LD, ONIX, Dublin Core, MARC21, IPTC) are well under 64 KB, and similarity hash quality plateaus well before that.

Proposal

Add to IEP-0002 a META_TRIM_META constant (128 000 bytes / ~128 KB) as the maximum decoded payload size for the meta element.

Add preprocessing step to the meta element specification:

The decoded payload size shall not exceed 128 000 bytes. Implementations shall reject meta inputs whose decoded payload exceeds this limit.

Rationale for rejection over truncation

Unlike name and description (text that can be meaningfully truncated at character boundaries), the meta payload is structured binary data (JSON, XML, images, audio samples). Truncation at arbitrary byte boundaries produces corrupted payloads and would create an inconsistency between the returned meta Data-URL (original) and the metahash (computed from truncated data). Rejection is the appropriate behavior.

Rationale for 128 KB limit

  • Typical JSON-LD metadata: 1–10 KB
  • ONIX records: 5–50 KB
  • Large Dublin Core with embedded thumbnails: up to ~80 KB
  • 128 KB provides >2x headroom over realistic use cases
  • SimHash quality plateaus well below this threshold

Checklist

  • IEP-0002 updated with META_TRIM_META constant and size limit
  • Implementation guidance for fail-fast pre-decode check (check Data-URL string length before base64 decoding)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions