Replication package for the paper:
FixtureDB: A Multi-Language Dataset of Test Fixture Definitions from Open-Source Software
João Almeida, Andre Hora
ICSME 2026 — Tool Demonstration and Data Showcase Track
TODO: add DOI once published
This repository contains the extraction pipeline that builds FixtureDB. The dataset itself (SQLite database + CSV exports) is archived separately on Zenodo at TODO: Zenodo DOI.
| Property | Value |
|---|---|
| Collection Period | April 1–2, 2026 |
| GitHub API Version | v3 REST API |
| Tree-sitter Grammar | v0.21.0+ (Python, Java, JavaScript, TypeScript) |
| Complexity Tool | Lizard v1.21.0+ |
| Cognitive Complexity | complexipy v5.0.0+ |
| Python Version | 3.8+ (see requirements.txt) |
For exact tool versions and reproducibility details, see docs/04-data-collection.md.
Complete documentation has been organized into dedicated files in the docs/ folder:
| Document | Purpose |
|---|---|
| docs/INDEX.md | Start here — overview and quick navigation |
| docs/01-intro.md | What is FixtureDB and why it matters |
| docs/02-repository-structure.md | Project layout and organization |
| docs/03-database-schema.md | Complete ERD and table specifications |
| docs/04-data-collection.md | Five-phase pipeline walkthrough |
| docs/05-storage.md | Disk usage and database growth |
| docs/06-setup.md | Installation and dependencies |
| docs/07-running.md | Command reference for pipeline operations |
| docs/08-reproducing.md | Exact corpus replication with pinned commits |
| docs/09-usage.md | SQL query examples and data access |
| docs/10-configuration.md | All tunable parameters |
| docs/11-detection.md | Tree-sitter AST and mock detection |
| docs/12-limitations.md | Known constraints and validation status |
| docs/13-license.md | MIT (code) and CC BY 4.0 (dataset) |
| docs/14-criteria-tracking.md | Research question tracking |
| docs/15-csv-user-guide.md | CSV exports for non-SQL users |
| docs/18-example-analyses.md | 5 research questions with findings |
# Install dependencies
pip install -r requirements.txt
# Set up your GitHub token
cp .env.example .env
# Edit .env and add your GITHUB_TOKEN
# Initialize the database
python pipeline.py init
# Run the full pipeline (all languages)
python pipeline.py runFor detailed setup, see docs/06-setup.md.
FixtureDB is a structured dataset of test fixture definitions extracted from open-source software repositories on GitHub across Python, Java, JavaScript, and TypeScript.
A test fixture is any code that prepares or tears down state before or after a test runs. For each fixture, the dataset records structural metadata (size, complexity, scope, type) and mock framework usage.
Why it matters: Prior empirical work on fixtures is exclusively Java-based. FixtureDB is the first cross-language resource treating the fixture as its primary unit of analysis.
See docs/01-intro.md for the full overview.
FixtureDB focuses exclusively on quantitative, objective aspects of test fixtures:
-
Framework Detection: Syntactically unambiguous markers only (decorators, annotations, attributes)
- Python:
@pytest.fixture,setUp()/tearDown()methods - Java:
@Before/@Afterannotations - JavaScript/TypeScript: Mocha/Jest
beforeEach()/afterEach()and related patterns
- Python:
-
Structural Metrics: Lines of code, cyclomatic complexity, parameter counts, fixture type/scope
-
Mock Framework Usage: Detection of mock object patterns within fixture code
CSV exports contain quantitative metrics. The SQLite database includes additional internal infrastructure for reproducibility and future research.
All fixture detectors include comprehensive unit tests (tests/test_framework_detection.py) verifying:
- Correct framework identification across supported languages
- AST-based detection accuracy
- Cross-language consistency
See docs/11-detection.md for technical details on detection algorithms.
The following visualizations provide an overview of the FixtureDB corpus:
Repository Distribution and Pipeline Status
Creation Timeline and Activity Patterns
Fixture Distribution and Scope Patterns
Mock Usage and Framework Diversity
Nesting, Reuse, and Complexity Patterns












