Gvm_release | Jiarui's Homepage

We released GVM - Gradient Variance Minimization, a framework to improve the data sampling efficiency in LLMs math reasoning. Starting from rejection sampling, we generalize our pipeline to RL algorithms like GRPO, and present corresponding theoretical analysis for our algorithm.