A local, review-first self-improvement layer for Codex.
Codex Self-Improving Loop helps Codex recall previous sessions, capture durable preferences, propose reusable skills, detect unsafe learning candidates, and evolve through a governed learning loop.
It is designed for developers who want their coding agent to improve over time without giving it unchecked permission to rewrite its own long-term behavior.
| Capability | What it does | Default output |
|---|---|---|
| Session recall | Searches previous Codex sessions and returns short redacted snippets | terminal output |
| Memory candidates | Extracts stable preferences, safety corrections, and durable lessons | $HOME/.codex/memories/inbox |
| Memory promotion | Promotes one reviewed memory into global USER.md |
$HOME/.codex/memories/USER.md |
| Candidate scoring | Finds repeated, short, safe memory candidates | terminal or JSON report |
| Skill candidates | Captures reusable workflows that may become future skills | $HOME/.codex/skill-candidates/inbox |
| Skill patch candidates | Captures evidence that an existing skill should be upgraded | $HOME/.codex/skill-candidates/patches |
| Safety scan | Flags secrets, private URLs, redacted values, prompt injection text, and raw transcript markers | terminal or Markdown report |
| End-of-task nudge | Runs the learning loop in review mode near handoff | $HOME/.codex/nudge-reports |
| Session watcher | Polls Codex session files and runs the nudge after idle periods | $HOME/.codex/memory-watcher-state.json |
| Usage metadata | Tracks skill use_count, last_used, and failures |
$HOME/.codex/skill-usage.json |
| Learning reports | Generates skill index and learning inbox summaries | $HOME/.codex/*.md |
Many coding agents are strong inside one session but lose useful collaboration context across sessions. Users end up repeating preferences, project rules, verification habits, and hard-won lessons.
This project turns session experience into governed assets:
task experience
-> reviewable candidates
-> safety scan and scoring
-> explicit promotion or archival
-> future recall and skill evolution
The goal is not to dump every conversation into long-term memory. The goal is to keep a clean learning loop:
- Stable user preferences go into global memory.
- Project facts stay in project-level
AGENTS.md. - Reusable procedures become skill candidates.
- Risky or ambiguous findings stay in review.
- Secrets and redacted values are blocked.
This project is inspired by the self-improving loop in Hermes Agent: memory, reusable skills, session search, and nudges that encourage the agent to preserve useful lessons.
Codex Self-Improving Loop adapts that idea into a smaller local tool for Codex:
| Principle | Implementation |
|---|---|
| Local first | Files live under $HOME/.codex and $HOME/.agents; no hosted service required |
| Review first | Capture creates candidates; promotion is explicit |
| Cross-platform | Python standard library only, no shell-specific runtime dependency |
| Agent-readable | Skills are plain SKILL.md files with small command scripts |
| Installable by copy | install.py copies repository files instead of embedding generated blobs |
| Safe by default | Secret-like content and redacted values are blocked from promotion |
- It is not a replacement for Codex.
- It is not a vector database or hosted memory service.
- It does not auto-edit project code.
- It does not automatically enable newly proposed skills.
- It does not make unsafe memories safe; it only helps detect and quarantine them.
- Python 3.10 or newer.
- Codex configured to discover skills from
$HOME/.agents/skills.
No third-party Python packages are required.
git clone https://github.com/newcatshuang/codex-self-improving-loop.git
cd codex-self-improving-loop
python install.pyRestart Codex or open a new session after installing so skill discovery reloads the new files.
Verify the installation in temporary directories:
python tests/verify-install.py --codex-root /tmp/codex-sil --agents-root /tmp/agents-silWindows users can use any temporary paths:
python tests/verify-install.py --codex-root C:/Temp/codex-sil --agents-root C:/Temp/agents-silCustom install roots:
python install.py --codex-root /tmp/codex-test --agents-root /tmp/agents-test --forceThe installer:
- Copies
agents/skills/session-recallinto$HOME/.agents/skills/session-recall. - Copies
agents/skills/memory-captureinto$HOME/.agents/skills/memory-capture. - Creates learning inbox directories under
$HOME/.codex. - Copies
codex/memories/USER.template.mdto$HOME/.codex/memories/USER.mdonly if it does not exist. - Appends
codex/AGENTS.learning-block.mdto$HOME/.codex/AGENTS.mdusing idempotent markers.
Search previous sessions:
python "$HOME/.agents/skills/session-recall/scripts/search_sessions.py" --query "previous error" --max-results 10Capture memory candidates from the latest session:
python "$HOME/.agents/skills/memory-capture/scripts/extract_memory.py" --max-messages 40Promote one reviewed memory:
python "$HOME/.agents/skills/memory-capture/scripts/promote_memory.py" \
--text "Prefer concise engineering handoffs with verification and residual risk." \
--approvedRun the end-of-task self-improvement loop:
python "$HOME/.agents/skills/memory-capture/scripts/codex_memory_nudge.py"Run the automatic session watcher. In long-running mode, it polls once per hour by default and processes sessions that have been idle for at least 10 minutes:
python "$HOME/.agents/skills/memory-capture/scripts/codex_session_watcher.py"Run one watcher cycle for testing:
python "$HOME/.agents/skills/memory-capture/scripts/codex_session_watcher.py" --once --dry-runFor OS schedulers such as cron, launchd, systemd timers, or Windows Task Scheduler, schedule one real cycle hourly:
python install_watcher_schedule.pyGenerate maintenance reports:
python "$HOME/.agents/skills/memory-capture/scripts/generate_skills_index.py"
python "$HOME/.agents/skills/memory-capture/scripts/summarize_learning_inbox.py"
python "$HOME/.agents/skills/memory-capture/scripts/show_skill_usage.py"| Script | Purpose |
|---|---|
search_sessions.py |
Search local Codex session history with redaction |
extract_memory.py |
Create memory candidates from recent session context |
promote_memory.py |
Promote one reviewed memory into USER.md |
promote_candidates.py |
Score, optionally auto-promote, and archive processed memory candidates |
compact_user_memory.py |
Report global memory budget, duplicates, conflicts, and safety risks |
extract_skill_candidate.py |
Create review-only skill candidates |
extract_skill_patch_candidate.py |
Create review-only skill patch candidates |
scan_skill_candidates.py |
Scan skill candidates and patch candidates for safety risks |
record_skill_usage.py |
Record usage metadata for a skill |
show_skill_usage.py |
Show skill usage metadata |
generate_skills_index.py |
Generate a skill index from installed SKILL.md files |
summarize_learning_inbox.py |
Summarize memory, skill, patch, scan, and usage signals |
codex_memory_nudge.py |
Run the full review-mode learning loop |
codex_session_watcher.py |
Watch session files and run nudge after idle periods |
install_watcher_schedule.py |
Install an hourly OS schedule for the installed watcher |
codex-self-improving-loop/
├─ README.md
├─ README.zh-CN.md
├─ LICENSE
├─ install.py
├─ install_watcher_schedule.py
├─ tests/
│ └─ verify-install.py
├─ codex/
│ ├─ AGENTS.learning-block.md
│ └─ memories/
│ └─ USER.template.md
└─ agents/
└─ skills/
├─ session-recall/
│ ├─ SKILL.md
│ └─ scripts/
│ └─ search_sessions.py
└─ memory-capture/
├─ SKILL.md
└─ scripts/
├─ codex_memory_nudge.py
├─ codex_session_watcher.py
├─ compact_user_memory.py
├─ extract_memory.py
├─ extract_skill_candidate.py
├─ extract_skill_patch_candidate.py
├─ generate_skills_index.py
├─ learning_loop_common.py
├─ promote_candidates.py
├─ promote_memory.py
├─ record_skill_usage.py
├─ scan_skill_candidates.py
├─ show_skill_usage.py
└─ summarize_learning_inbox.py
Default runtime outputs live under $HOME/.codex:
.codex/
├─ memories/
│ ├─ USER.md
│ ├─ inbox/
│ └─ archive/
├─ skill-candidates/
│ ├─ inbox/
│ ├─ patches/
│ └─ archive/
├─ nudge-reports/
├─ memory-watcher-state.json
├─ skill-usage.json
├─ skills-index.md
└─ learning-inbox-summary.md
These files are local runtime state. Do not commit them unless intentionally curated.
Codex does not always expose a reliable session-end hook across every environment. The watcher provides a lightweight external trigger:
poll $HOME/.codex/sessions
-> find idle unprocessed session files
-> run codex_memory_nudge.py --session-file <file>
-> write nudge reports and watcher state
Defaults:
| Option | Default |
|---|---|
--interval-seconds |
3600 |
--idle-seconds |
600 |
--max-sessions-per-run |
0 |
--max-sessions-per-run 0 means all ready sessions in the current cycle. This is the default because the watcher is I/O-light and uses a lock plus processed-session state to avoid duplicate work.
By default, the first run processes all historical session files that are idle and not already marked processed. To limit the first run and future runs to a time window, pass --since-date YYYY-MM-DD.
The watcher is review-first. It does not run promote_memory.py --approved, does not apply skill patches, and does not auto-promote candidates.
Examples:
# Long-running watcher
python "$HOME/.agents/skills/memory-capture/scripts/codex_session_watcher.py"
# One cycle without writing reports
python "$HOME/.agents/skills/memory-capture/scripts/codex_session_watcher.py" --once --dry-run
# One real cycle
python "$HOME/.agents/skills/memory-capture/scripts/codex_session_watcher.py" --once
# Only process sessions on or after a date
python "$HOME/.agents/skills/memory-capture/scripts/codex_session_watcher.py" --once --since-date 2026-05-01
# Install an hourly OS schedule at minute 0, using the installed watcher under $HOME/.agents
python install_watcher_schedule.pyFor workstation setups, an hourly OS scheduler that runs the --once command on the hour is usually more reliable than keeping a terminal process open. Long-running mode remains available when a persistent process manager is already in use.
Schedule installer backends:
| Platform | Backend |
|---|---|
| Windows | Task Scheduler via schtasks.exe /SC HOURLY /MO 1 |
| Linux | systemd user timer |
| macOS | launchd LaunchAgent |
This project intentionally separates discovery from promotion.
| Stage | Behavior |
|---|---|
| Capture | Writes review-only candidate files |
| Scan | Flags secrets, redacted values, prompt injection text, private URLs, and transcript markers |
| Score | Identifies repeated, short, safe preference candidates |
| Promote | Requires explicit --approved, except conservative --auto-promote candidate flow |
| Archive | Moves only processed candidate files; unresolved review items stay visible |
Hard rules:
- Never store raw secrets in memory files.
- Never reconstruct
[REDACTED]values. - Treat
conflict_reviewas a hard stop for automatic promotion. - Keep project facts in project-level
AGENTS.md, not globalUSER.md. - Review and scan skill candidates before turning them into real skills.
Run local verification:
python tests/verify-install.py --codex-root ./tmp/codex --agents-root ./tmp/agentsRun syntax checks:
python -m compileall agents install.py tests- Hermes Agent: the self-improving agent loop built around memory, skill creation, skill evolution, session search, and learning nudges.
MIT