Skip to content

Handle weighted messages in chat fine-tuning data prep#2524

Open
biefan wants to merge 1 commit intoopenai:mainfrom
biefan:fix-chat-finetune-weight-token-count
Open

Handle weighted messages in chat fine-tuning data prep#2524
biefan wants to merge 1 commit intoopenai:mainfrom
biefan:fix-chat-finetune-weight-token-count

Conversation

@biefan
Copy link

@biefan biefan commented Mar 14, 2026

Summary

  • skip the numeric weight field when estimating chat fine-tuning tokens
  • JSON-encode other non-string message values before passing them to tiktoken

Validation

  • reproduced the old failure with a weighted message: TypeError: expected string or buffer
  • executed the updated notebook function via uv run --with tiktoken,numpy python and verified weighted messages no longer raise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant