Kaiyan Zhang (张开颜)

I’m currently serving as the CTO of Frontis.AI, building self-improving AI agents for real-world enterprise and scientific applications. I earned my Ph.D. (2026) from the Department of Electronic Engineering, Tsinghua University, under the guidance of Professor Bowen Zhou. Before that, I earned B.S. (2020) and M.S. (2022) degrees in Computer Science and Technology from the Harbin Institute of Technology (HIT), where I was supervised by Weinan Zhang and Ting Liu in the HIT-SCIR lab.

My mission is to build AI that improves itself — shifting from human-supervised training toward agents that learn from their own experience and recursively bootstrap stronger successors (the ExpertAGI vision). This pursuit runs along two intertwined threads.

The first is the learning machinery for self-improvement: scalable and test-time reinforcement learning, multi-agent training, and reward modeling that let models supervise and improve other models (AI for AI) — work like TTRL, SSRL, and MARTI, surveyed in our overview of RL for large reasoning models. The second is putting self-improving agents to work in high-value settings such as enterprise and scientific scenarios — charted in our survey on self- to meta-evolution — alongside rigorous benchmarks like NatureBench and EnterpriseClawBench for evaluating agents on real-world tasks. Most recently, we released OpenRSI: the open OpenMLE stack and Frontis-MA1, our first AI4AI model post-trained as a meta-evolution agent for machine learning engineering.

We are hiring interns! (Link) If you are passionate about agent self-evolution, recursive self-improvement, and AI for AI, feel free to reach out. We publish technical reports and release open-source work.

news

Jul 31, 2026	We release OpenRSI — executable AI4AI toward recursive self-improvement, including Frontis-MA1, a post-trained meta-evolution agent for ML engineering, and the full OpenMLE stack (Gym / RL / Evo).
Jun 28, 2026	We release a survey on self- and meta-evolution of self-improving agents: Awesome-Self-Improving-Agents .
Jun 24, 2026	We release two agentic benchmarks: NatureBench (AI for AI) and EnterpriseClawBench (real-world enterprise tasks) .
Jun 18, 2026	Two papers are accepted to ECCV 2026, congrats to the collaborators.
Apr 04, 2026	One paper is accepted to ACL 2026, congrats to the collaborators.
Jan 26, 2026	Five papers are accepted to ICLR 2026, congrats to the collaborators.
Sep 19, 2025	TTRL was accepted to NeurIPS 2025, Congratulations!
Sep 11, 2025	Excited to share our new survey paper on RL for Large Reasoning Models .
Aug 21, 2025	One paper is accepted to EMNLP 2025 (see ReviewRL).
Aug 15, 2025	We investigate agentic search RL without reliance on external search engine while maintaining strong sim2real generalization. (see SSRL ).
Jun 26, 2025	Two papers are accepted to ICCV 2025, congrats to the collaborators.
May 27, 2025	We are very excited to release MARTI: A framework for LLM-based Multi-Agent Reinforced Training and Inference. (see MARTI ).
May 16, 2025	Two papers are accepted to ACL 2025 Main, congrats to the collaborators.
May 14, 2025	Just shared our latest work on TTS, RL and TTRL at QingkeTalk.
May 02, 2025	Four papers are accepted to ICML 2025, congrats to the collaborators.
Apr 23, 2025	We release Test-time Reinforcement Learning (TTRL), which investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in LLMs. (see TTRL ).
Mar 31, 2025	We release collections of RL recipes (see Awesome-RL-Reasoning-Recipes ).
Mar 24, 2025	Video-T1 is released, which firstly evaluate TTS on video generation (see Video-T1 ).
Feb 10, 2025	We explore compute-optimal test-time scaling (see compute-optimal-tts ).
Jan 23, 2025	One first-author paper is accepted to ICLR 2025 (see OpenPRM).
Dec 24, 2024	One paper is accepted to AAAI 2025 (Congrats to Xinwei).
Sep 27, 2024	One first-author paper is accepted to NeurIPS 2024 D&B Track (see UltraMedical ).
Sep 20, 2024	One paper is accepted to EMNLP 2024 (see LPA).
Jul 10, 2024	One co-first author paper is accepted to COLM 2024 (see LLM4BioHypoGen).
May 16, 2024	Two papers are accepted to ACL 2024 (One first-author, see CoGenesis).
Mar 13, 2024	One paper is accepted to NAACL 2024 (see PAD).
Oct 06, 2023	One first-author paper is accepted to EMNLP 2023 (see CRaSh).

selected publications

Self-Improving Agents in the Real World

Frontis-MA1: Training an AI4AI Model towards Recursive Self-Improvement in Machine Learning Engineering

Junlin Yang, Che Jiang, Yu Fu, +18 more authors, Ning Ding, Bowen Zhou, and Kaiyan Zhang^†

Preprint, 2026

Bib PDF Code

@article{FrontisMA1,
  title = {Frontis-MA1: Training an AI4AI Model towards Recursive Self-Improvement in Machine Learning Engineering},
  author = {Yang, Junlin and Jiang, Che and Fu, Yu and authors, +18 more and Ding, Ning and Zhou, Bowen and Zhang, Kaiyan},
  year = {2026},
  journal = {Preprint},
}

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

Yuru Wang, Lejun Cheng, Yuxin Zuo, +11 more authors, Ning Ding^†, Bowen Zhou^†, and Kaiyan Zhang^†

Preprint, 2026

Bib PDF Code

@article{NatureBench,
  title = {NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?},
  author = {Wang, Yuru and Cheng, Lejun and Zuo, Yuxin and authors, +11 more and Ding, Ning and Zhou, Bowen and Zhang, Kaiyan},
  year = {2026},
  journal = {Preprint},
}

EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions

Jincheng Zhong, Weizhi Wang, Che Jiang, +4 more authors, and Kaiyan Zhang^†

Preprint, 2026

Bib PDF Code

@article{EnterpriseClawBench,
  title = {EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions},
  author = {Zhong, Jincheng and Wang, Weizhi and Jiang, Che and authors, +4 more and Zhang, Kaiyan},
  year = {2026},
  journal = {Preprint},
}

Self-Improving Agents in the Era of Experience: A Survey of Self- to Meta-Evolution

Che Jiang, Jincheng Zhong, Yu Fu, +21 more authors, Ning Ding^†, Kaiyan Zhang^†, and Bowen Zhou^†

Preprint, 2026

Bib PDF Code

@article{SelfImprovingAgentsSurvey,
  title = {Self-Improving Agents in the Era of Experience: A Survey of Self- to Meta-Evolution},
  author = {Jiang, Che and Zhong, Jincheng and Fu, Yu and authors, +21 more and Ding, Ning and Zhang, Kaiyan and Zhou, Bowen},
  year = {2026},
  journal = {Preprint},
}

Ultramedical: Building specialized generalists in biomedicine

Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, and 1 more author

The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024

Bib PDF Code

@article{zhang2024ultramedical,
  title = {Ultramedical: Building specialized generalists in biomedicine},
  author = {Zhang, Kaiyan and Zeng, Sihang and Hua, Ermo and Ding, Ning and Chen, Zhang-Ren and Ma, Zhiyuan and Li, Haoxin and Cui, Ganqu and Qi, Biqing and Zhu, Xuekai and others},
  journal = {The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year = {2024},
}

Learning Machinery for Self-Improvement

MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference

Kaiyan Zhang^*†, Runze Liu^*, Xuekai Zhu^*, Kai Tian^*, Sihang Zeng^*, Guoli Jia^*, Yuchen Fan^*, Xingtai Lv^*, Yuxin Zuo^*, Che Jiang^*, and 16 more authors

The Fourteenth International Conference on Learning Representations, 2026

Bib PDF Code

@article{MARTI,
  title = {MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference},
  author = {Zhang, Kaiyan and Liu, Runze and Zhu, Xuekai and Tian, Kai and Zeng, Sihang and Jia, Guoli and Fan, Yuchen and Lv, Xingtai and Zuo, Yuxin and Jiang, Che and Liu, Ziyang and Wang, Jianyu and Wang, Yuru and Zhao, Ruotong and Hua, Ermo and Wang, Yibo and Wang, Shijie and Gao, Junqi and Long, Xinwei and Sun, Youbang and Ma, Zhiyuan and Cui, Ganqu and Bai, Lei and Ding, Ning and Qi, Biqing and Zhou, Bowen},
  year = {2026},
  journal = {The Fourteenth International Conference on Learning Representations},
}

A Survey of Reinforcement Learning for Large Reasoning Models

Kaiyan Zhang^*†, Yuxin Zuo^*†, Bingxiang He^*, Youbang Sun^*, Runze Liu^*, Che Jiang^*, Yuchen Fan^*, Kai Tian^*, Guoli Jia^*, Pengfei Li^*, and 29 more authors

Preprint, 2025

Bib PDF Code

@article{RL4LRM,
  title = {A Survey of Reinforcement Learning for Large Reasoning Models},
  author = {Zhang, Kaiyan and Zuo, Yuxin and He, Bingxiang and Sun, Youbang and Liu, Runze and Jiang, Che and Fan, Yuchen and Tian, Kai and Jia, Guoli and Li, Pengfei and Fu, Yu and Lv, Xingtai and Zhang, Yuchen and Zeng, Sihang and Qu, Shang and Li, Haozhan and Wang, Shijie and Wang, Yuru and Long, Xinwei and Liu, Fangfu and Xu, Xiang and Ma, Jiaze and Zhu, Xuekai and Hua, Ermo and Liu, Yihao and Li, Zonglin and Chen, Huayu and Qu, Xiaoye and Li, Yafu and Chen, Weize and Yuan, Zhenzhao and Gao, Junqi and Li, Dong and Ma, Zhiyuan and Cui, Ganqu and Liu, Zhiyuan and Qi, Biqing and Ding, Ning and Zhou, Bowen},
  year = {2025},
  journal = {Preprint},
}

SSRL: Self-Search Reinforcement Learning

Yuchen Fan^*, Kaiyan Zhang^*†, Heng Zhou^*, Yuxin Zuo, Yanxu Chen, Yu Fu, Xinwei Long, Xuekai Zhu, Che Jiang, Yuchen Zhang, and 8 more authors

Preprint, 2025

Bib PDF Code

@article{SSRL,
  title = {SSRL: Self-Search Reinforcement Learning},
  author = {Fan, Yuchen and Zhang, Kaiyan and Zhou, Heng and Zuo, Yuxin and Chen, Yanxu and Fu, Yu and Long, Xinwei and Zhu, Xuekai and Jiang, Che and Zhang, Yuchen and Kang, Li and Chen, Gang and Huang, Cheng and He, Zhizhou and Wang, Bingning and Bai, Lei and Ding, Ning and Zhou, Bowen},
  year = {2025},
  journal = {Preprint},
}

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo^*, Kaiyan Zhang^*†, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui^†, Ning Ding, and Bowen Zhou

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, 2025

Bib PDF Code

@article{TTRL,
  title = {TTRL: Test-Time Reinforcement Learning},
  author = {Zuo, Yuxin and Zhang, Kaiyan and Qu, Shang and Sheng, Li and Zhu, Xuekai and Qi, Biqing and Sun, Youbang and Cui, Ganqu and Ding, Ning and Zhou, Bowen},
  year = {2025},
  journal = {The Thirty-Ninth Annual Conference on Neural Information Processing Systems},
}