Skip to content

Add script-aware language tags for FLORES 200#1272

Open
goktugozkanmd wants to merge 1 commit into
huggingface:mainfrom
goktugozkanmd:a67-issue-745-language-script
Open

Add script-aware language tags for FLORES 200#1272
goktugozkanmd wants to merge 1 commit into
huggingface:mainfrom
goktugozkanmd:a67-issue-745-language-script

Conversation

@goktugozkanmd

Copy link
Copy Markdown

Summary

  • Add Script, LanguageWithScript, and language_from_tag() to preserve script information from dataset tags such as zho_Hant, while Language.value keeps the base ISO 639 code
  • Add script-aware translation literal lookup with fallback to the base language
  • Add Traditional Chinese template literals and re-enable zho_Hant in FLORES 200

Fixes #745.

Tests

  • ruff check and ruff format --check pass on the changed files
  • pytest tests/unit/prompt tests/unit/tasks tests/unit/utils/test_language.py — 67 passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FT] Manage script and language in the Language enum

1 participant