Documentation and sanitised (but genuine) outputs from a private China A-share quantitative research platform, published so that a technical reader can verify in minutes that the platform is real, substantial, and engineered to institutional research standards.
Not here, by design: signal formulas, the factor pool, factor construction, any runnable alpha code.
notebooks/factor_evaluation_walkthrough.ipynb — one real factor ("Factor A": a single factor, not a composite, identity withheld) taken through the platform's standard evaluation, with genuine outputs preserved inline and every chart explained:
coverage → IC by horizon (in-sample vs out-of-sample) → IC stability → decile monotonicity → sub-universe robustness (CSI 300/500/1000) → cost break-even → attribution vs the size factor → the gate verdict
Five-minute read. If you only open one file, open that one.
| Codebase | 100,000+ lines of Python, 700+ files |
| Market | China A-shares, full-market point-in-time universes |
| Datasets | 200+ registered, declaratively specified |
| Factor archive | 310 factor cards, 307 factor panels recomputed daily |
| Factor families | 12 (price/volume, fundamentals, money flow, margin, broker flow, limit-move behaviour, text, minute-bar microstructure, corporate events, GP-mined, CNN chart, composites) |
| Automation | 31 scheduled tasks, event-chained |
| Testing | 1,400+ offline test cases in 90 modules; CI with ruff + mypy + pytest |
These figures describe the private monorepo and are stated for context — this repository is an auditable sample of the platform's output, not a mirror of its source.
flowchart LR
I["Idea"] --> C["Construct<br/>(private)"]
C --> E["Standardised evaluation<br/>IS 2020-2022 and OOS 2023+<br/>full market + CSI 300/500/1000<br/>all gated independently"]
E -->|"fails any window<br/>or sub-universe"| K["Closed - negative result<br/>recorded, never re-explored"]
E -->|"passes both"| CARD["Factor card<br/>spec-hashed snapshot"]
CARD --> G{"Governance gates<br/>IC / persistence / cost /<br/>repeated confirmation"}
G -->|"evidence supports"| P["Production pool<br/>combination → portfolio<br/>→ backtest"]
G -->|"evidence decays"| K
P -->|"daily re-audit"| G
Machine-audited, exception-driven, append-only audit trail — details in factor gating.
All five are real platform output for the notebook's Factor A (single factor, identity withheld; IS 2020-2022, OOS 2023 onwards).
![]() |
![]() |
![]() |
![]() |
![]() |
| Document | One-line summary |
|---|---|
| Architecture | Layers, data flow, design principles |
| Data pipeline | Declarative dataset registry, atomic Parquet storage, PIT semantics |
| Research framework | The 8-layer factor pipeline and evaluation methodology |
| Factor gating | Lifecycle gates a factor must pass to reach production |
| Backtesting | Execution model and the point-in-time discipline |
| Engineering | Tests, CI, scheduling, data health, monitoring |
Publishing formulas destroys them, and screenshots of Sharpe ratios prove nothing. What transfers — and what this repository evidences — is the discipline: PIT correctness enforced by code, methodology locked by hashed specs, promotion gates in front of the portfolio, and every leakage bug ever found turned into a permanent regression test.
Python 3.14 · polars · pandas · PyArrow/Parquet · LightGBM · PyTorch · DEAP · Streamlit · matplotlib · pytest · ruff · mypy · GitHub Actions




