Cosmos Evaluator is an automated evaluation system for synthetic video output generated by Cosmos. It validates quality across multiple dimensions — object correspondence, environment conditions, and more — producing structured quality scores you can use to assess and improve your video generation pipeline.
This document will show you how to get started with running the different checker containers.
This release includes four evaluation checks and the service framework to run them:
| Check | What It Does |
|---|---|
| Hallucination Check | This checker looks for hallucinated movement in an augmented video. It compares movement masks between original and augmented videos, and detects new hallucinated moving objects, or removed moving objects in the augmented video. |
| Obstacle Check | Evaluates object detection correspondence between world model ground truth and generated video. Scores how well objects (vehicles, pedestrians, infrastructure) in the generated video match their expected positions. |
| VLM Preset Check | Uses a Vision-Language Model to verify that generated video matches specified environment conditions (weather, time of day, geography, road surface). |
| Attribute Verification Check | Uses a Vision-Language Model to verify that the augmented video contains certain attributes using a LLM question-and-answer based approach. |
All checks are available as REST API microservices using the included service framework. They can also be used programmatically via their Python APIs. Advanced users can build custom checks on top of the framework.
To get started with the Cosmos Evaluator, and for more information about what is offered, please visit the documentation in the Getting Started Guide.
Other documentation for this repository may be found under the docs/ directory.
The Cosmos Evaluator is licensed under the Apache 2.0 license. This project is currently not accepting contributions.
