grpo-training

Here are 7 public repositories matching this topic...

vivoCameraResearch / SmartPhotoCrafter

official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"

Updated May 26, 2026
Python

winstonsmith1897 / GTPO

Star

Group-relative Trajectory-based Policy Optimization: Increasing Quality and Training Stability

reinforcement-learning reinforcement-learning-algorithms train fine post-training llm rlhf grpo-training

Updated Feb 23, 2026
Jupyter Notebook

DeepGym / deepgym

Star

RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.

python machine-learning reinforcement-learning deep-learning sandbox evaluation rl code-execution ai-agents daytona llm unsloth coding-agents grpo verifiable-rewards openrlhf reward-function grpo-training

Updated Apr 24, 2026
Python

Surya-Hariharan / OpenMedRL-openenv

Star

OpenMedRL is an open-source reinforcement learning environment for benchmarking LLM-powered medical agents in emergency care. It simulates triage, dynamic patient progression, resource constraints, and uncertainty-aware clinical decision-making.

medical-ai medical-triage huggingface-transformers huggingface-spaces unsloth openenv grpo-training

Updated Jun 21, 2026
Python

Vidit-Ostwal / price-negotiation-rl-OpenEnv

Sponsor

Star

An OpenEnv RL environment where an LLM agent plays the buyer and negotiates against an LLM-powered seller over real marketplace listings.

python machine-learning reinforcement-learning rl rl-environment openenv grpo-training price-negotiator openenv-environment

Updated May 9, 2026
Python

injamul3798 / LLM-Fine-tuning-RL-Hands-on-Lab-code-Intro-to-Post-training

Star

This repository contains my personal notes and hands-on implementations for fine-tuning and post-training Large Language Models (LLMs).

reinforcement-learning post-training ppo finetuning-llms grpo-training

Updated May 1, 2026
Jupyter Notebook

safoura-banihashemi / qwen3-terminal-grpo

Star

A reinforcement learning fine-tuned model that generates Linux terminal commands from natural language descriptions. Trained using GRPO (Group Relative Policy Optimization) on a custom terminal task environment inspired by CAMEL-AI's SETA framework.

lora fine-tuning huggingface grpo-training

Updated May 17, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the grpo-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the grpo-training topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpo-training

Here are 7 public repositories matching this topic...

vivoCameraResearch / SmartPhotoCrafter

winstonsmith1897 / GTPO

DeepGym / deepgym

Surya-Hariharan / OpenMedRL-openenv

Vidit-Ostwal / price-negotiation-rl-OpenEnv

injamul3798 / LLM-Fine-tuning-RL-Hands-on-Lab-code-Intro-to-Post-training

safoura-banihashemi / qwen3-terminal-grpo

Improve this page

Add this topic to your repo