Publications

Publications by categories in reversed chronological order.

2025

  1. preprint
    Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
    Jiarui Yao, Yifan Hao, Hanning Zhang, Hanze Dong, Wei Xiong, Nan Jiang, and Tong Zhang
    arXiv preprint arXiv:2505.02391, 2025
  2. preprint
    A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
    Wei Xiong, Jiarui Yao, Yuhui Xu, Bo Pang, Lei Wang, Doyen Sahoo, Junnan Li, Nan Jiang, Tong Zhang, Caiming Xiong, and  others
    arXiv preprint arXiv:2504.11343, 2025
  3. preprint
    FANS – Formal Answer Selection for Natural Language Math Reasoning Using Lean4
    Jiarui Yao, Ruida Wang, and Tong Zhang
    arXiv preprint arXiv:2503.03238, 2025
  4. preprint
    Rethinking Diverse Human Preference Learning through Principal Component Analysis
    Feng Luo, Rui Yang, Hao Sun, Chunyuan Deng, Jiarui Yao, Jingyan Shen, Huan Zhang, and Hanjie Chen
    arXiv preprint arXiv:2502.13131, 2025
  5. Blog
    Online-DPO-R1: Unlocking Effective Reasoning Without the PPO Overhead
    Hanning Zhang, Jiarui Yao, Chenlu Ye, Wei Xiong, and Tong Zhang
    2025
    Notion Blog

2024

  1. Neurips 2024
    Shadowcast: Stealthy data poisoning attacks against vision-language models
    Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, and Furong Huang
    arXiv preprint arXiv:2402.06659, 2024
  2. preprint
    EscapeBench: Pushing Language Models to Think Outside the Box
    Cheng Qian, Peixuan Han, Qinyu Luo, Bingxiang He, Xiusi Chen, Yuji Zhang, Hongyi Du, Jiarui Yao, Xiaocheng Yang, Denghui Zhang, and  others
    arXiv preprint arXiv:2412.13549, 2024