Problem
IEP-0002 requires the meta element to be wrapped in a Data-URL but defines no upper bound on decoded payload size. Neither iscc-core nor iscc-lib enforce any limit.
A multi-megabyte Data-URL payload is fully base64-decoded and processed through soft_hash_meta_v0 (byte n-gram generation + SimHash), causing excessive memory consumption. For a payload of size N, the function generates ~(N - 3) overlapping 4-byte n-grams, each hashed with BLAKE3, then processed through SimHash — O(N) memory and compute.
Realistic metadata payloads (JSON-LD, ONIX, Dublin Core, MARC21, IPTC) are well under 64 KB, and similarity hash quality plateaus well before that.
Proposal
Add to IEP-0002 a META_TRIM_META constant (128 000 bytes / ~128 KB) as the maximum decoded payload size for the meta element.
Add preprocessing step to the meta element specification:
The decoded payload size shall not exceed 128 000 bytes. Implementations shall reject meta inputs whose decoded payload exceeds this limit.
Rationale for rejection over truncation
Unlike name and description (text that can be meaningfully truncated at character boundaries), the meta payload is structured binary data (JSON, XML, images, audio samples). Truncation at arbitrary byte boundaries produces corrupted payloads and would create an inconsistency between the returned meta Data-URL (original) and the metahash (computed from truncated data). Rejection is the appropriate behavior.
Rationale for 128 KB limit
- Typical JSON-LD metadata: 1–10 KB
- ONIX records: 5–50 KB
- Large Dublin Core with embedded thumbnails: up to ~80 KB
- 128 KB provides >2x headroom over realistic use cases
- SimHash quality plateaus well below this threshold
Checklist
Problem
IEP-0002 requires the
metaelement to be wrapped in a Data-URL but defines no upper bound on decoded payload size. Neither iscc-core nor iscc-lib enforce any limit.A multi-megabyte Data-URL payload is fully base64-decoded and processed through
soft_hash_meta_v0(byte n-gram generation + SimHash), causing excessive memory consumption. For a payload of size N, the function generates ~(N - 3) overlapping 4-byte n-grams, each hashed with BLAKE3, then processed through SimHash — O(N) memory and compute.Realistic metadata payloads (JSON-LD, ONIX, Dublin Core, MARC21, IPTC) are well under 64 KB, and similarity hash quality plateaus well before that.
Proposal
Add to IEP-0002 a
META_TRIM_METAconstant (128 000 bytes / ~128 KB) as the maximum decoded payload size for themetaelement.Add preprocessing step to the
metaelement specification:Rationale for rejection over truncation
Unlike
nameanddescription(text that can be meaningfully truncated at character boundaries), themetapayload is structured binary data (JSON, XML, images, audio samples). Truncation at arbitrary byte boundaries produces corrupted payloads and would create an inconsistency between the returnedmetaData-URL (original) and themetahash(computed from truncated data). Rejection is the appropriate behavior.Rationale for 128 KB limit
Checklist
META_TRIM_METAconstant and size limit