Skip to content

anibridge/anibridge-mappings

Repository files navigation

anibridge-mappings

Mapping Entries Distinct IDs Download Mappings

anibridge-mappings is a dataset and pipeline for mapping episode-level relationships between anime entries across various databases, including AniDB, AniList, MAL, TMDB, TVDB, and IMDB. The schema was designed for use in the AniBridge project, but the dataset is open for anyone to use and contribute to.

The mapping payload is generated by a Python pipeline that merges, validates, and serializes data from multiple trusted sources.

Releases are updated daily and you can explore the dataset interactively at https://mappings.anibridge.eliasbenb.dev.

A huge thank you to the primary mappings maintainer, @LuceoEtzio, for contributing over 4,000 mapping edits! ❤️

Download

The latest mappings can be downloaded from the releases page. The release assets include:

Note: releases are updated daily and tagged with a v{major} version, where breaking changes to the schema or pipeline will increment the major version. Patch releases may be made within the same major version to fix mapping errors or make minor, non-breaking schema adjustments.

How It Works

  1. Fetch sources: Download upstream datasets and metadata feeds.
  2. Build ID graph: Collect cross-database ID links from sources.
  3. Collect metadata: Fetch relevant metadata (episode counts, durations, season info, etc.) from sources.
  4. Build episode graph: Normalize and merge episode mappings from all sources.
  5. Infer mappings: Use techniques like transitive closure and metadata alignment to infer missing episode mappings.
  6. Apply edits: Overlay mapping overrides from mappings.edits.yaml onto the aggregated data.
  7. Validate & prune: Validate episode ranges against metadata and remove invalid, overlapping, or inconsistent mappings.
  8. Emit schema: Serialize to the mappings.schema.json format.

Data Sources

Source Metadata ID Mappings Episode Mappings Providers
Anime-Lists/anime-lists No Yes Yes AniDB, IMDB, TMDB, TVDB
manami-project/anime-offline-database Not Yet Yes No AniDB, AniList, MAL
notseteve/AnimeAggregations Yes Yes No AniDB, IMDB, MAL, TMDB
varoOP/shinkro-mapping No Yes Yes MAL, TMDB, TVDB
QLever Yes Yes No AniDB, AniList, IMDB, MAL, TMDB, TVDB
AniList GraphQL Yes Not Yet No AniList
MyAnimeList API Yes No No MAL
TMDB API Yes No No TMDB
TVDB API Yes No No TVDB

Note: "Not Yet" indicates potential future work.

Mappings Schema

mappings.schema.json

The output is a JSON object where each key is a source descriptor and each value is a map of target descriptors. Mappings are unidirectional: a mapping from A -> B does not imply B -> A, so reverse lookups require their own explicit entries. Descriptors use the format:

provider:id[:scope]
  • provider: one of anidb, anilist, imdb_movie, imdb_show, mal, tmdb_show, tmdb_movie, tvdb_show.
  • id: the provider-specific identifier (e.g. AniDB ID 1234 or TMDB ID tt1234567).
  • scope: is optional and used to denote some type of subsetting. It is important to understand that this schema is flexible, different providers will have different notations for the scope:
    • imdb_show|tmdb_show|tvdb_show: these will use and require scopes to denote seasons in the format s{season_number} (e.g. s1, s0)
    • anidb: uses and requires scopes to denote episode types: R (regular), S (specials), O (other), C (credits), T (trailers), P (parodies).
    • anilist|mal: these omit scopes, as they don't have a concept of seasons or episode types in the same way (e.g. anilist:12345).
    • imdb_movie|tmdb_movie|tvdb_movie: these omit scopes since movies don't have seasons or episode types (e.g. imdb_movie:tt1234567).

Each target descriptor maps source episode ranges to target ranges:

{
  "tvdb_show:2:s0": {
    "anilist:1001": {}, // from tvdb id 2, season 0 to anilist id 1001
    "mal:2001": {}, // from tvdb id 2, season 0 to mal id 2001
  },
  "tmdb_show:3:s1": {
    "anilist:1002": {}, // from tmdb id 3, season 1 to anilist id 1002
  },
}

The key, value of each target descriptor is a map where keys denote a source range and values denote the corresponding target range. For the purposes of this dataset, keys and values will define episode ranges.

Source ranges must be a single contiguous range:

x[-y]

Target ranges support comma-separated segments and an optional trailing ratio:

x[-y][,x2[-y2]...][|ratio]
  • x: starting episode number (1-based).
  • y: optional ending episode number (inclusive). If omitted, the range is open-ended.
  • ratio: optional ratio indicating the 'weight' of each episode in a target mapping. A positive ratio n indicates each source episode spans n target episodes. A negative ratio -n indicates each source episode spans 1/n target episodes.
  • Multiple ranges can be comma-separated to denote non-contiguous mappings. Note: non-contiguous ranges are only supported on the target side.
  • The ratio must appear at the end of the target range string.
{
  "tmdb_show:500:s1": {
    "anilist:1003": {
      "1-12": "1-12", // source episodes 1-12 map to target episodes 1-12
      "14-": "13-", // source episodes 14 and onward map to target episodes 13 and onward
    },
    "mal:2003": {
      "1-12": "1-6,8-13", // source episodes 1-12 map to target episodes 1-6 and 8-13 (skipping 7)
      "13-": "14-|2", // source episodes 13 and onward map to target episodes 14 and onward at double granularity
    },
  },
}

Manual Edits

Mapping overrides live in mappings.edits.yaml. The format mirrors the schema structure: a source descriptor maps to target descriptors, which in turn map source ranges to target ranges.

Example:

anilist:12345: # Some comment about this mapping
  tvdb_show:98765:s1:
    "1-12": "1-12"
  tmdb_show:54321:s1:
    "1-12": "1-12"

When the pipeline runs, it removes any existing mappings between the specified source and target scopes and replaces them with your entries.

Running the Pipeline

The CLI entrypoint is main.py. Typical usage:

uv run ./main.py

Options:

  • --out: output file path (default: data/out/mappings.json)
  • --edits: path to the edits file (default: mappings.edits.yaml)
  • --compress: emit minified and zstd-compressed outputs to data/out/
  • --stats: emit stats.json to data/out/
  • --provenance: emit provenance.zip containing manifest.json, descriptor-index.json, and descriptors/*.json files
  • --log-level: set logging verbosity (default: INFO)

Note: TMDB and TVDB metadata fetching require authentication in TMDB_API_KEY and TVDB_API_KEY. MAL ranking metadata uses MAL_CLIENT_ID only, and falls back to the public client ID baked into the source when unset.

Contributing

The best way to contribute is by fixing or adding mappings in mappings.edits.yaml. If you need to reference why a mapping was changed, include a comment inside the mapping entry (not at the root level), so the formatter can preserve it.