official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"
-
Updated
May 26, 2026 - Python
official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"
Group-relative Trajectory-based Policy Optimization: Increasing Quality and Training Stability
RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.
OpenMedRL is an open-source reinforcement learning environment for benchmarking LLM-powered medical agents in emergency care. It simulates triage, dynamic patient progression, resource constraints, and uncertainty-aware clinical decision-making.
An OpenEnv RL environment where an LLM agent plays the buyer and negotiates against an LLM-powered seller over real marketplace listings.
This repository contains my personal notes and hands-on implementations for fine-tuning and post-training Large Language Models (LLMs).
A reinforcement learning fine-tuned model that generates Linux terminal commands from natural language descriptions. Trained using GRPO (Group Relative Policy Optimization) on a custom terminal task environment inspired by CAMEL-AI's SETA framework.
Add a description, image, and links to the grpo-training topic page so that developers can more easily learn about it.
To associate your repository with the grpo-training topic, visit your repo's landing page and select "manage topics."