-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Thanks for the awesome work! I have some question regarding the key steps extraction and preparation.
Step-wise reasoning accuracy reward (StepRAR) calculation is based on a set of key reasoning steps from the corresponding reasoning path in the dataset, which goes through the following 3 steps for preparation
- Step 1. Use GPT-4 to extract several key steps from the reasoning path for each question
- Step 2. Refine the extracted steps by removing redundant content and retaining only the core few words necessary for reasoning.
- Step 3. Augment each extracted key step into multiple equivalent formats to allow more flexible and accurate matching
Can you please take a look at my questions below
- Can you please share the GPT-4 prompt to extract the key steps in Step 1.
- How is Step 2 done? If this is done through GPT-4, can you please share the corresponding prompt?
- How is Step 3 done? If this is done through GPT-4, again can you please share the prompt?
Also are you planning to release the training data for StepGRPO? If so, do you have an estimated timeline?
Thanks in advance!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels