Skip to content
View shipbehaves's full-sized avatar

Block or report shipbehaves

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. constitutional-cai constitutional-cai Public

    Constitutional AI reproduction (Bai et al. 2022) on a small open model: self-critique/revise SFT + RLAIF DPO, two-axis safety/over-refusal eval, and a failure analysis of the over-refusal regression.

    Python

  2. distributed-sft-fsdp distributed-sft-fsdp Public

    Genuine multi-GPU FSDP full fine-tune of a 7B across 4x A100 (closes the scale gap), with the tied-embeddings, collective-save, and checkpoint-consolidation gotchas made concrete.

    Python

  3. grpo-gsm8k grpo-gsm8k Public

    GRPO + RLVR on GSM8K (DeepSeek-R1 / TinyZero recipe): verifiable reward, no reward/value model, with the vLLM-rollout necessity and headroom lessons made concrete.

    Python

  4. regulated-evals regulated-evals Public

    Reproducible, regulation-anchored Trustworthy-AI scorecards for frontier and open-weight models in regulated industries (finance first).

    Python

  5. reward-model-ppo reward-model-ppo Public

    Classic RLHF: train a reward model (0.757 held-out) then PPO a policy against it, with an honest teardown of PPO's memory + instability cost vs DPO.

    Python

  6. self-reward-collapse self-reward-collapse Public

    Does a model training on its own judgment collapse? On verifiable math: the reward gets hacked (a brevity reward halves answer length) but capability does not collapse. An honest failure analysis w…

    Python