Raftpp_release

We wrote a report analyzing what makes GRPO “stand out” for math reasoning, with some understanding and ablation studies to compare different algorithms for LLMs reasoning training.