Kaiyan Zhang (张开颜)

I am a final-year PhD candidate at the Department of Electronic Engineering, Tsinghua University, under the guidance of Professor Bowen Zhou. I earned B.S. (2020) and M.S. (2022) degrees in Computer Science and Technology from the Harbin Institute of Technology (HIT), where I was supervised by Weinan Zhang and Ting Liu in the HIT-SCIR lab.

My current passion is pushing the boundaries of domain-specific superintelligence (called ExpertAGI), enabling AI systems to achieve expert-level reasoning and collaboration across high-value and practical scenarios. My research directions include:

Scalable Learning (e.g., RL): Developing novel frameworks for scalable reinforcement learning, such as TTRL (test-time RL with unlabeled data), SSRL (self-search RL leveraging intrinsic model capabilities), MARTI (multi-agent RL coordination), and OpenPRM (scalable process reward modeling), all aiming to reduce supervision costs and unlock self-improving LLMs.
Collaborative Intelligence: Designing mechanisms for model cooperation and synergy, including CRaSh (efficient fine-tuning via clustering and sharing), CoGenesis (secure collaboration between large and small models), FS-Gen (unified laws in collaborative decoding), and MARTI, to empower collective intelligence among agents.
Scientific Intelligence: Applying LLMs to scientific discovery, with projects like UltraMedical (generalist biomedical models), hypothesis proposer (autonomous scientific hypothesis generation), and ReviewRL (reinforcement learning for automated scientific review), advancing AI’s role in research and innovation.

Expect to graduate in June 2026. My CV is here.

news

Sep 19, 2025	TTRL was accepted to NeurIPS 2025, Congratulations!
Sep 11, 2025	Excited to share our new survey paper on RL for Large Reasoning Models .
Aug 21, 2025	One paper is accepted to EMNLP 2025 (see ReviewRL).
Aug 15, 2025	We investigate agentic search RL without reliance on external search engine while maintaining strong sim2real generalization. (see SSRL ).
Jun 26, 2025	Two papers are accepted to ICCV 2025, congrats to the collaborators.
May 27, 2025	We are very excited to release MARTI: A framework for LLM-based Multi-Agent Reinforced Training and Inference. (see MARTI ).
May 16, 2025	Two papers are accepted to ACL 2025 Main, congrats to the collaborators.
May 14, 2025	Just shared our latest work on TTS, RL and TTRL at QingkeTalk.
May 02, 2025	Four papers are accepted to ICML 2025, congrats to the collaborators.
Apr 23, 2025	We release Test-time Reinforcement Learning (TTRL), which investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in LLMs. (see TTRL ).
Mar 31, 2025	We release collections of RL recipes (see Awesome-RL-Reasoning-Recipes ).
Mar 24, 2025	Video-T1 is released, which firstly evaluate TTS on video generation (see Video-T1 ).
Feb 10, 2025	We explore compute-optimal test-time scaling (see compute-optimal-tts ).
Jan 23, 2025	One first-author paper is accepted to ICLR 2025 (see OpenPRM).
Dec 24, 2024	One paper is accepted to AAAI 2025 (Congrats to Xinwei).
Sep 27, 2024	One first-author paper is accepted to NeurIPS 2024 D&B Track (see UltraMedical ).
Sep 20, 2024	One paper is accepted to EMNLP 2024 (see LPA).
Jul 10, 2024	One co-first author paper is accepted to COLM 2024 (see LLM4BioHypoGen).
May 16, 2024	Two papers are accepted to ACL 2024 (One first-author, see CoGenesis).
Mar 13, 2024	One paper is accepted to NAACL 2024 (see PAD).
Oct 06, 2023	One first-author paper is accepted to EMNLP 2023 (see CRaSh).

selected publications

Arxiv

A Survey of Reinforcement Learning for Large Reasoning Models

Kaiyan Zhang^*†, Yuxin Zuo^*†, Bingxiang He^*, Youbang Sun^*, Runze Liu^*, Che Jiang^*, Yuchen Fan^*, Kai Tian^*, Guoli Jia^*, Pengfei Li^*, and 29 more authors

Preprint, 2025

PDF Code
EMNLP 2025

ReviewRL: Towards Automated Scientific Review with RL

Sihang Zeng^*, Kai Tian^*, Kaiyan Zhang^*, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, and 1 more author

The 2025 Conference on Empirical Methods in Natural Language Processing, 2025

PDF Code
Arxiv

SSRL: Self-Search Reinforcement Learning

Yuchen Fan^*, Kaiyan Zhang^*†, Heng Zhou^*, Yuxin Zuo, Yanxu Chen, Yu Fu, Xinwei Long, Xuekai Zhu, Che Jiang, Yuchen Zhang, and 8 more authors

Preprint, 2025

PDF Code
GitHub

MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference

Kaiyan Zhang^*†, Runze Liu^*, Xuekai Zhu^*, Kai Tian^*, Sihang Zeng^*, Guoli Jia^*, Yuchen Fan^*, Xingtai Lv^*, Yuxin Zuo^*, Che Jiang^*, and 16 more authors

GitHub, 2025

Code
NeurIPS 2025

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo^*, Kaiyan Zhang^*†, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui^†, Ning Ding, and Bowen Zhou

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, 2025

PDF Code
ICLR 2025

OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees

Kaiyan Zhang, Jiayuan Zhang, Haoxin Li, Xuekai Zhu, Ermo Hua, Xingtai Lv, Ning Ding, Biqing Qi, and Bowen Zhou

The Thirteenth International Conference on Learning Representations, 2025

PDF
Arxiv

Towards Building Specialized Generalist AI with System 1 and System 2 Fusion

Kaiyan Zhang^*, Biqing Qi^*, and Bowen Zhou

Preprint, 2024

PDF
ICML@MAS 2025

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

Kaiyan Zhang^*, Jianyu Wang^*, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, and Bowen Zhou

ICML 2025 Workshop on MAS, 2025

PDF Code
NeurIPS 2024

Ultramedical: Building specialized generalists in biomedicine

Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, and 1 more author

The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024

PDF Code
ACL 2024

CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following

Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, and Bowen Zhou

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

PDF Code
COLM 2024

Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation

Biqing Qi^*, Kaiyan Zhang^*, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, and Bowen Zhou

First Conference on Language Modeling, 2024

PDF Code
EMNLP 2023

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, and Bowen Zhou

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

PDF Code