v0.1 alpha -- implements the v1 Inspect slice.
Dataset Forge inspects image datasets for GPT-style artifacts and produces explainable, evidence-backed findings.
It is designed for practitioners preparing LoRA training datasets who want to understand what is wrong with their data -- and what should be left alone -- before doing anything to it.
v0.1 alpha is analysis only. It reads your dataset. It does not touch your images. Cleanup, UI, plugins, and additional analyzers are not part of this release.
Dataset Forge builds a statistical picture of your dataset, then runs independent analyzers that compare each image against that baseline. Each finding explains what was measured, how confident the analyzer is, and what action (if any) is warranted.
A healthy dataset can legitimately produce zero findings. That is a valid and correct result, not a failure.
Analyzers in v1:
| Analyzer | What it detects | Status |
|---|---|---|
texture_analyzer/v1 |
Elevated microtexture density relative to dataset baseline | First-pass; uncalibrated |
crystalline_faceting_analyzer/v1 |
Angular micro-polygon shading on surfaces | First-pass; uncalibrated |
Both analyzers are conservative. Confidence values are capped at 0.70 and 0.45 respectively until calibration against labeled ground truth is complete. Treat findings as candidates for human review, not automated decisions.
Dataset Forge is for people who:
- Train LoRA models and suspect their image dataset carries GPT fingerprints
- Want to know what is wrong before they change anything
- Work with images generated by GPT-based tools (DALL-E, Midjourney, Ideogram, etc.)
- Need findings they can audit, not opaque scores
It is not a general image quality tool, an upscaler, or a cleanup utility. The first reference use case is watercolor and colored-pencil anthropomorphic character datasets with GPT-style artifacts including crystalline microtexture, glitter-like speckle, periodic frequency contamination, oversharpening, and edge halos.
-
Analyzers are not calibrated to published ground truth. Thresholds were derived from an initial labeled review of one private dataset. Precision and recall are known for that dataset but are not general. Treat findings as informed candidates for human review, not certified detections.
-
Two analyzers ship in v0.1 alpha. Three planned families (speck/glitter, periodic frequency noise, oversharpening/halo) are not yet implemented. Research probes for two of these have been completed and deferred.
-
No cleanup. v0.1 alpha is read-only. Cleanup is planned for v2 and will require human approval at every step. See ROADMAP.md. Code for future cleanup phases exists in the repository but is not active or supported in v0.1 alpha.
-
No UI. Dataset Forge is a CLI tool. Report output is JSON and plain text.
-
z-score findings require dataset context.
texture_analyzer/v1uses dataset-relative z-scores. On a dataset of fewer than five images the baseline statistics are not meaningful. -
Most scripts are internal development utilities. The public scripts are
run_benchmarks.py,generate_crystalline_fixtures.py,generate_texture_fixtures.py, andgenerate_benchmark_defects.py. All other files inscripts/-- whether prefixed with_or not -- are internal calibration, diagnostic, or research tools and are not part of the public API.scripts/research/holds artifact-family research probes.
- Python 3.11 or newer
- uv (recommended) or pip
Runtime dependencies (installed automatically):
- Pillow >= 10.0
- opencv-python >= 4.10
git clone https://github.com/surrealbydesign/dataset-forge.git
cd dataset-forge
uv sync
Or with pip:
pip install -e .
Point it at a folder of images:
uv run dataset-forge inspect path/to/your/dataset/
With an explicit output directory:
uv run dataset-forge inspect path/to/your/dataset/ --output path/to/report/
Terminal output:
Dataset Forge Inspect
=====================
Dataset: path/to/your/dataset/
Output: path/to/your/dataset/inspect_output
Images: 100
Analyzed: 100
Errors: 0
Summary
-------
Total findings: 19
HIGH severity: 2
MEDIUM severity: 11
LOW severity: 6
Images with findings: 15 / 100
Images with no issues: 85 / 100
85 images require no action.
15 images have findings. Review report for details.
Report written:
path/to/your/dataset/inspect_output/inspection_report.json
path/to/your/dataset/inspect_output/inspection_report.txt
Reports are written to the output directory (default: a folder named
inspect_output/ inside your dataset directory). Source images are not
touched.
uv run dataset-forge inspect path/to/dataset/ --gallery
Writes inspection_gallery.png -- a contact sheet with findings grouped by
severity alongside clean reference images.
Each finding in inspection_report.txt looks like this:
image_023.png
[MEDIUM] artifact.crystalline_faceting -- confidence 0.45 (FP rate ~28%)
Benchmark: uncalibrated
Evidence: pencil_grain_score=64.2, watercolor_smoothness_score=36.6, microtexture_density_score=65.8
Why: pencil_grain=64.2 is above the detection threshold. Crystalline
surface faceting detected based on mid-frequency texture pattern.
Action: Candidate for review. Do not apply cleanup without inspecting
the image.
Every finding includes:
- Severity (LOW / MEDIUM / HIGH / CRITICAL)
- Confidence and estimated false-positive rate
- Benchmark version --
uncalibratedmeans thresholds have not been validated against published ground truth for your dataset type - Raw evidence -- the measurements that produced the finding
- A plain-language explanation of why the finding was made
- A recommended action, which may be "leave alone"
Images with no findings are listed separately. They are not an afterthought.
- Source images are read-only. Dataset Forge never writes to your image files. No move, rename, modify, or delete operation is performed on source images.
- Reports are written separately. All output goes to the directory you specify, not inside your dataset.
- Cleanup is not implemented in v1. There is no flag or command that modifies images in any way. This is by design.
- Every finding is explainable. No finding is emitted without an evidence dict, a human-readable explanation, and a recommendation. No black-box scores.
- Healthy images produce no findings. The tool does not generate recommendations for images that do not warrant them.
Analyzer thresholds are validated against committed synthetic fixtures. The public benchmark suite runs without any setup from a fresh clone:
uv run python scripts/run_benchmarks.py
Current public coverage: 10 expectations across TextureAnalyzer and CrystallineFacetingAnalyzer. All 10 pass. See benchmarks/README.md for the full manifest description.
The disk-backed measurement cache is internal and opt-in. It is disabled by default, stores measurements only, and has no CLI flags.
DATASET_FORGE_MEASUREMENT_CACHE_DIR=/path/to/cacheenables the cache.DATASET_FORGE_DISABLE_MEASUREMENT_CACHE=1bypasses cache reads and writes.
uv run pytest tests/
648 tests passing. Tests cover the full v1 pipeline: Finding, DatasetContext, Analyzer contracts, report writers, CLI, inspect runner, gallery, benchmark framework, committed fixtures, and public CLI surface.
MIT. See LICENSE.
| Document | Contents |
|---|---|
| PROJECT_BIBLE.md | Project constitution -- read before changing anything |
| ARCHITECTURE.md | v1 pipeline structure, Finding schema, artifact family model |
| WHY.md | Reasoning behind major design decisions |
| DIRECTION.md | Current milestone and scope |
| ROADMAP.md | v1 -> v2 -> v3 milestone plan |
| CURRENT_STATUS.md | Implementation status; resume from here |
| CLI_OUTPUT.md | Acceptance criteria for terminal and report output |
| benchmarks/README.md | Benchmark manifests and fixture inventory |