Utilities for IIIF time-based media: extract chapters from Range structures, speakers from WebVTT, and fragments from annotation targets.
A/V players for IIIF content need temporal data—start and end times for chapters, speakers, and annotations. This package extracts timing into simple objects ready for player integration.
parseRanges(manifest);
// → [{ id, label: 'Act I', startTime: 0, endTime: 302.05 }, ...]
parseSpeakers(vttContent);
// → [{ speaker: 'Narrator', startTime: 0, endTime: 45 }, ...]Related ecosystem packages:
- @iiif/presentation-3 — TypeScript types for IIIF resources (types only, no runtime)
- @iiif/parser — Traverse, normalize, upgrade IIIF manifests
- maniiifest — Zero-dependency general IIIF parsing
- cozy-iiif — Lightweight IIIF parsing utilities
A/V players (full-featured React player components):
- Ramp — IIIF Presentation 3.0 player by Samvera/Avalon
- aviary-iiif-player — Aviary platform's React player component
This package complements both layers: the general parsers handle manifest structure and normalization; the players handle rendering. This library extracts the temporal data between them—media fragment timing, speaker segments, annotation targets—as simple objects usable with any player or server-side pipeline.
- Chapters — Parse IIIF Range structures into
{startTime, endTime}data - Speakers — Extract WebVTT voice tags, merging consecutive cues
- Annotation targets — Parse SpecificResource/FragmentSelector with temporal and spatial support
- Zero dependencies
- Strict TypeScript types
- ESM-only, tree-shakeable
- Tested against IIIF Cookbook examples
- Node.js 20+ (check with
node --version) - ESM project - your
package.jsonmust have"type": "module"
npm install @umd-mith/iiif-media-parsersCreate a test file:
// test-install.ts
import { parseMediaFragment } from '@umd-mith/iiif-media-parsers';
const result = parseMediaFragment('https://example.org/canvas#t=10,20');
console.log(result);
// Should print: { source: 'https://example.org/canvas', temporal: { start: 10, end: 20 } }Run it:
npx tsx test-install.tsimport { parseRanges, parseSpeakers, parseAnnotationTarget } from '@umd-mith/iiif-media-parsers';
// Parse chapters from IIIF manifest
const chapters = parseRanges(manifest);
// => [{ id: 'range-1', label: 'Introduction', startTime: 0, endTime: 30 }]
// Extract speakers from WebVTT
const speakers = parseSpeakers(vttContent);
// => [{ speaker: 'Narrator', startTime: 0, endTime: 120 }]
// Parse media fragment URI
const target = parseAnnotationTarget('https://example.org/canvas#t=10,20');
// => { source: 'https://example.org/canvas', temporal: { start: 10, end: 20 } }Parses IIIF Presentation API v3 Range structures into chapter objects.
import { parseRanges } from '@umd-mith/iiif-media-parsers';
const manifest = {
id: 'https://example.org/manifest',
type: 'Manifest',
structures: [
{
id: 'range-1',
type: 'Range',
label: { en: ['Introduction'] },
items: [{ id: 'canvas#t=0,30', type: 'Canvas' }]
},
{
id: 'range-2',
type: 'Range',
label: { en: ['Main Content'] },
items: [{ id: 'canvas#t=30,120', type: 'Canvas' }]
}
]
};
const chapters = parseRanges(manifest);
// => [
// { id: 'range-1', label: 'Introduction', startTime: 0, endTime: 30 },
// { id: 'range-2', label: 'Main Content', startTime: 30, endTime: 120 }
// ]Parameters:
manifest- IIIF Presentation API v3 Manifest object
Returns: Chapter[] - Array of chapters sorted by startTime
Note: Open-ended fragments (e.g., #t=3971.24) use the canvas's duration for the end time. Without a duration, the parser skips the range.
Extracts speaker segments from WebVTT voice tags (<v>).
import { parseSpeakers } from '@umd-mith/iiif-media-parsers';
const vtt = `WEBVTT
00:00:00.000 --> 00:00:10.000
<v Mary Johnson>I remember when the community center first opened.
00:00:10.000 --> 00:00:25.000
<v Mary Johnson>It was such an important place for all of us.
00:00:25.000 --> 00:00:40.000
<v Interviewer>Can you tell me more about those early days?`;
const segments = parseSpeakers(vtt);
// => [
// { speaker: 'Mary Johnson', startTime: 0, endTime: 25 },
// { speaker: 'Interviewer', startTime: 25, endTime: 40 }
// ]Parameters:
vttContent- Raw WebVTT file content as string
Returns: SpeakerSegment[] - Array of speaker segments sorted by startTime
Parses IIIF annotation targets, extracting temporal and spatial fragments.
import { parseAnnotationTarget } from '@umd-mith/iiif-media-parsers';
// Simple string URI with temporal fragment
const result1 = parseAnnotationTarget('https://example.org/canvas#t=10,20');
// => { source: 'https://example.org/canvas', temporal: { start: 10, end: 20 } }
// Spatial fragment (for images/video regions)
const result2 = parseAnnotationTarget('https://example.org/canvas#xywh=100,200,50,75');
// => { source: '...', spatial: { x: 100, y: 200, width: 50, height: 75, unit: 'pixel' } }
// SpecificResource with FragmentSelector
const result3 = parseAnnotationTarget({
type: 'SpecificResource',
source: 'https://example.org/canvas',
selector: { type: 'FragmentSelector', value: 't=10,20' }
});
// => { source: 'https://example.org/canvas', temporal: { start: 10, end: 20 } }Parameters:
target- String URI or SpecificResource object
Returns: ParsedAnnotationTarget | null
Low-level parser for W3C Media Fragment URIs.
import { parseMediaFragment } from '@umd-mith/iiif-media-parsers';
// Temporal fragments
parseMediaFragment('https://example.org/video#t=10,20');
// => { source: '...', temporal: { start: 10, end: 20 } }
parseMediaFragment('https://example.org/video#t=10');
// => { source: '...', temporal: { start: 10 } } // end optional
parseMediaFragment('https://example.org/video#t=,20');
// => { source: '...', temporal: { start: 0, end: 20 } } // from beginning
// Spatial fragments
parseMediaFragment('https://example.org/image#xywh=100,200,50,75');
// => { source: '...', spatial: { x: 100, y: 200, width: 50, height: 75, unit: 'pixel' } }
parseMediaFragment('https://example.org/image#xywh=percent:10,20,30,40');
// => { source: '...', spatial: { ..., unit: 'percent' } }All functions validate input per W3C and IIIF specifications, returning null or undefined for invalid data rather than throwing exceptions.
Returns empty array when:
- Manifest has no
structuresproperty - No ranges contain valid temporal fragments
Skips ranges when:
- No Canvas items with
#t=fragments - Temporal fragment malformed (non-numeric, negative values)
- Time range invalid (
end <= start) - Open-ended fragment without canvas
durationto resolve end time
Returns empty array when:
- Input is null, undefined, or empty/whitespace-only string
- VTT contains no cues with voice tags (
<v Speaker>)
Skips cues when:
- Timing line malformed
- No voice tag present in cue text
Returns null when:
- Input is null, undefined, or empty string
- Object lacks
type: 'SpecificResource'
Returns undefined for fragment properties when:
- No fragment present in URI or selector
- Fragment is malformed (
#t=invalid,#t=) - Values are negative (
#t=-5,20) - Time range reversed (
#t=20,10where end <= start) - Percentage values exceed bounds (>100 or region outside canvas)
interface Chapter {
id: string; // Unique identifier from IIIF Range
label: string; // Human-readable chapter label
startTime: number; // Start time in seconds
endTime: number; // End time in seconds
thumbnail?: string; // Optional thumbnail URL
metadata?: Record<string, string>; // Optional key-value metadata
}interface SpeakerSegment {
speaker: string; // Speaker name from <v> tag
startTime: number; // Start time in seconds
endTime: number; // End time in seconds
}interface TemporalFragment {
start: number; // Start time in seconds
end?: number; // End time in seconds (optional per W3C spec)
}interface SpatialFragment {
x: number; // X coordinate
y: number; // Y coordinate
width: number; // Width
height: number; // Height
unit: 'pixel' | 'percent'; // Coordinate unit
}interface ParsedAnnotationTarget {
source: string; // Canvas/source URI without fragment
temporal?: TemporalFragment; // Temporal fragment if present
spatial?: SpatialFragment; // Spatial fragment if present
}type IIIFResourceType = 'Canvas' | 'Image' | 'Sound' | 'Video';type AnnotationTargetInput =
| string // Simple URI with fragment (e.g., "canvas#t=10,20")
| {
type: 'SpecificResource';
source: string | { id: string; type?: IIIFResourceType };
selector?: {
type: 'FragmentSelector' | string;
value?: string;
conformsTo?: string;
};
};Build a chapter-based timeline for oral history recordings:
import { parseRanges, parseSpeakers } from '@umd-mith/iiif-media-parsers';
// Load IIIF manifest and VTT transcript
const manifest = await fetch(manifestUrl).then((r) => r.json());
const vtt = await fetch(transcriptUrl).then((r) => r.text());
// Extract navigation data
const chapters = parseRanges(manifest);
const speakers = parseSpeakers(vtt);
// Build timeline UI
chapters.forEach((chapter) => {
const chapterSpeakers = speakers.filter(
(s) => s.startTime >= chapter.startTime && s.startTime < chapter.endTime
);
console.log(`${chapter.label}: ${chapterSpeakers.map((s) => s.speaker).join(', ')}`);
});Jump to specific moments from IIIF annotations:
import { parseAnnotationTarget } from '@umd-mith/iiif-media-parsers';
// From IIIF annotation
const annotation = {
type: 'Annotation',
target: 'https://example.org/canvas#t=45.5,52.3'
};
const parsed = parseAnnotationTarget(annotation.target);
if (parsed?.temporal) {
videoPlayer.currentTime = parsed.temporal.start;
videoPlayer.play();
}This library implements:
- W3C Media Fragments URI 1.0 - temporal and spatial targeting
- IIIF Presentation API 3.0 - Range structures
- WebVTT - voice tags for speaker metadata
IIIF manifest labels and metadata may contain user-controlled content. Escape output before DOM insertion to prevent XSS:
// Safe - uses textContent
element.textContent = chapter.label;
// Safe - uses DOM API
const textNode = document.createTextNode(chapter.label);
element.appendChild(textNode);- Node.js: 20.x, 22.x
- Browsers: ES2020+ (Chrome 80+, Firefox 78+, Safari 14+)
- Module format: ESM only (no CommonJS)
- TypeScript: 5.0+
We developed this package using Anthropic's Claude as a generative coding tool, with human direction and review. Without AI assistance, we would have hoped someone else would build this but probably would not have diverted resources to implement it ourselves. We remain aware of the many critiques and concerns regarding generative AI; this experiment does not invalidate them.
Process: AI generated initial implementations, tests, and documentation based on W3C and IIIF specifications. Human maintainers directed requirements, reviewed all outputs, and take full responsibility for the final code.
Acknowledgment: AI capabilities derive partly from programmers whose public work became training data. Our open-source output depends on proprietary AI infrastructure.
Following Apache and
OpenInfra guidance, we use Assisted-by:
commit trailers for ongoing contributions.
# Clone and install
git clone https://github.com/umd-mith/iiif-media-parsers.git
cd iiif-media-parsers
pnpm install
# Run tests (watch mode)
pnpm test
# Run all checks
pnpm lint && pnpm format:check && pnpm type-check && pnpm test:ci
# Build
pnpm buildContributions welcome:
- Fork the repository
- Create a feature branch
- Write tests for new functionality
- Ensure checks pass (
pnpm lint && pnpm test) - Submit a pull request
Pre-commit hooks lint and format staged files automatically.
For AI-assisted contributions, include commit trailers:
Assisted-by: Claude <noreply@anthropic.com>
The Clear BSD License (SPDX: BSD-3-Clause-Clear) - see LICENSE for details.