Skip to content

Add comprehensive video metadata storage format #191

@talmo

Description

@talmo

Overview

Currently, video lineage metadata (source_video, original_video) is only stored for embedded videos in HDF5 groups. Non-embedded videos have no way to store this metadata, limiting our ability to track video provenance through multiple save/load cycles.

Additionally, the current videos_json format is brittle due to backwards compatibility constraints with SLEAP's cattrs deserialization (see #173).

Proposed Solution

Introduce a new video_metadata dataset that stores comprehensive video metadata in a flexible, versioned format while keeping videos_json unchanged for backwards compatibility.

Design

1. New Dataset Structure

# video_metadata dataset (single JSON string)
{
    "format_version": "1.0",  # Versioning for future extensibility
    "videos": [
        {
            "index": 0,  # Index matching videos_json order
            "filename": "path/to/video.mp4",
            "backend": {...},  # Full backend metadata
            "lineage": {
                "source_video": {
                    "filename": "source.pkg.slp",
                    "backend": {...},
                    "lineage": {...}  # Recursive, with depth limit
                },
                "original_video": {
                    "filename": "original.mp4", 
                    "backend": {...}
                }
            },
            "metadata": {
                # Extensible fields for future use
                "created": "2024-01-01T00:00:00",
                "last_modified": "2024-01-01T00:00:00",
                "user_metadata": {}
            }
        }
    ]
}

2. Key Benefits

  • Full Backwards Compatibility: Old SLEAP versions ignore unknown datasets
  • Forward Compatibility: Format versioning allows future extensions
  • Robustness: Falls back to legacy format if new metadata fails
  • Extensibility: Can add timestamps, user metadata, codec info, etc.
  • Performance: Single dataset read for all video metadata

3. Implementation Strategy

  • Keep writing to videos_json exactly as before
  • Write new video_metadata dataset in parallel
  • When reading, use video_metadata to enhance videos if available
  • Implement depth limits for lineage to prevent infinite recursion

4. Migration Path

  1. Read videos_json normally (required)
  2. Check for video_metadata and enhance if present (optional)
  3. Fall back to checking HDF5 groups for embedded videos (legacy)

Related Issues

Future Extensions

This format enables:

  • Storing video FPS, codec information
  • User-defined metadata
  • Compression statistics for embedded videos
  • Edit history/provenance tracking
  • Multi-camera synchronization metadata

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions