feat(skills): Create training-methods skill documenting 8 stable training methods

## Summary

Create a new `training-methods` skill that documents all 8 stable training methods with their data requirements, use cases, and recommended training order.

## Context

ReAlign supports 8 stable training methods but there's no centralized skill to help users choose the right method for their goal.

## What Does NOT Work

- No centralized skill documenting all training methods
- Users don't know which method to use for their goal
- Data format requirements scattered across code
- Training order not documented

## Implementation Approach

Create `.claude/skills/training-methods.md` documenting:

### 8 Stable Training Methods

| Method | Purpose | Data Format | Use Case |
|--------|---------|-------------|----------|
| **SFT/LoRA** | Base capabilities | instruction + output | Knowledge injection |
| **DPO** | Preference alignment | chosen + rejected | Behavior change |
| **ORPO** | Preference (simple) | chosen + rejected | No reference model needed |
| **GRPO** | Verifiable rewards | prompt + scored responses | Math/code verification |
| **CPO** | Conservative preference | chosen + rejected | Reduce distribution shift |
| **RLVR** | Verified rewards | problem + solution + verify | Complex reasoning |
| **Abliteration** | Remove constraints | harmful + harmless | Uncensoring |
| **Activation Steering** | Runtime control | inference-time | Reversible, composable |

### Data Formats

```python
# SFT
{"instruction": str, "output": str}

# DPO/ORPO/CPO
{"prompt": str, "chosen": str, "rejected": str}

# GRPO
{"prompt": str, "responses": [{"text": str, "score": float}]}

# RLVR
{"problem": str, "solution": str, "responses": [{"text": str, "correct": bool}]}
```

### Recommended Training Order
```
1. SFT (base) → Instruction following
2. DPO (alignment) → Behavior change
3. GRPO/RLVR (verification) → Reasoning improvement
4. Calibration → Uncertainty handling
5. Anti-hallucination → Reduce confabulation
```

### Backend Support Matrix
| Method | MLX | PyTorch | Cloud |
|--------|-----|---------|-------|
| LoRA | ✅ | ✅ | ✅ |
| DPO | ✅ | ✅ | ✅ |
| ORPO | ✅ | ✅ | ✅ |
| GRPO | ✅ | ✅ | ✅ |
| CPO | ✅ | ✅ | ✅ |
| RLVR | ✅ | ❌ | ❌ |
| Abliteration | ✅ | ✅ | ❌ |
| Steering | ✅ | ❌ | ❌ |

## Acceptance Criteria

- [ ] All 8 stable methods documented
- [ ] Data format examples for each method
- [ ] Training order recommendation with rationale
- [ ] CLI commands for each method
- [ ] Backend support matrix

## Related

- Issue #305 (data-curation-workflow)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): Create training-methods skill documenting 8 stable training methods #306

Summary

Context

What Does NOT Work

Implementation Approach

8 Stable Training Methods

Data Formats

Recommended Training Order

Backend Support Matrix

Acceptance Criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Method	Purpose	Data Format	Use Case
SFT/LoRA	Base capabilities	instruction + output	Knowledge injection
DPO	Preference alignment	chosen + rejected	Behavior change
ORPO	Preference (simple)	chosen + rejected	No reference model needed
GRPO	Verifiable rewards	prompt + scored responses	Math/code verification
CPO	Conservative preference	chosen + rejected	Reduce distribution shift
RLVR	Verified rewards	problem + solution + verify	Complex reasoning
Abliteration	Remove constraints	harmful + harmless	Uncensoring
Activation Steering	Runtime control	inference-time	Reversible, composable

Method	MLX	PyTorch	Cloud
LoRA	✅	✅	✅
DPO	✅	✅	✅
ORPO	✅	✅	✅
GRPO	✅	✅	✅
CPO	✅	✅	✅
RLVR	✅	❌	❌
Abliteration	✅	✅	❌
Steering	✅	❌	❌

feat(skills): Create training-methods skill documenting 8 stable training methods #306

Description

Summary

Context

What Does NOT Work

Implementation Approach

8 Stable Training Methods

Data Formats

Recommended Training Order

Backend Support Matrix

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions