Kaiyan Zhang (张开颜)

CTO, Frontis.AI · Ph.D., Tsinghua University

I’m currently serving as the CTO of Frontis.AI, where I work on agent self-evolution, recursive self-improvement, and AI for AI. I earned my Ph.D. (2026) from the Department of Electronic Engineering, Tsinghua University, under the guidance of Professor Bowen Zhou. Before that, I earned B.S. (2020) and M.S. (2022) degrees in Computer Science and Technology from the Harbin Institute of Technology (HIT), where I was supervised by Weinan Zhang and Ting Liu in the HIT-SCIR lab.

We are hiring interns! If you are passionate about agent self-evolution, recursive self-improvement, and AI for AI, feel free to reach out. We publish papers and release open-source work.

My mission is to build AI that improves itself — shifting from human-supervised training toward agents that learn from their own experience and recursively bootstrap stronger successors (the ExpertAGI vision). This pursuit runs along two intertwined threads.

The first is the learning machinery for self-improvement: scalable and test-time reinforcement learning, multi-agent training, and reward modeling that let models supervise and improve other models — work like TTRL, SSRL, and MARTI, surveyed in our overview of RL for large reasoning models. The second is putting self-improving agents to work in high-value settings such as AI for AI — charted in our survey on self- to meta-evolution — alongside rigorous benchmarks like NatureBench and EnterpriseClawBench that keep our claims of capability honest.

news

Jun 28, 2026 We release a survey on self- and meta-evolution of self-improving agents: Awesome-Self-Improving-Agents.
Jun 24, 2026 We release two agentic benchmarks: NatureBench (AI for AI) and EnterpriseClawBench (real-world enterprise tasks).
Jun 18, 2026 Two papers are accepted to ECCV 2026, congrats to the collaborators.
Apr 04, 2026 One paper is accepted to ACL 2026, congrats to the collaborators.
Jan 26, 2026 Five papers are accepted to ICLR 2026, congrats to the collaborators.
Sep 19, 2025 TTRL was accepted to NeurIPS 2025, Congratulations!
Sep 11, 2025 Excited to share our new survey paper on RL for Large Reasoning Models .
Aug 21, 2025 One paper is accepted to EMNLP 2025 (see ReviewRL).
Aug 15, 2025 We investigate agentic search RL without reliance on external search engine while maintaining strong sim2real generalization. (see SSRL ).
Jun 26, 2025 Two papers are accepted to ICCV 2025, congrats to the collaborators.
May 27, 2025 We are very excited to release MARTI: A framework for LLM-based Multi-Agent Reinforced Training and Inference. (see MARTI ).
May 16, 2025 Two papers are accepted to ACL 2025 Main, congrats to the collaborators.
May 14, 2025 Just shared our latest work on TTS, RL and TTRL at QingkeTalk.
May 02, 2025 Four papers are accepted to ICML 2025, congrats to the collaborators.
Apr 23, 2025 We release Test-time Reinforcement Learning (TTRL), which investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in LLMs. (see TTRL ).
Mar 31, 2025 We release collections of RL recipes (see Awesome-RL-Reasoning-Recipes ).
Mar 24, 2025 Video-T1 is released, which firstly evaluate TTS on video generation (see Video-T1 ).
Feb 10, 2025 We explore compute-optimal test-time scaling (see compute-optimal-tts ).
Jan 23, 2025 One first-author paper is accepted to ICLR 2025 (see OpenPRM).
Dec 24, 2024 One paper is accepted to AAAI 2025 (Congrats to Xinwei).
Sep 27, 2024 One first-author paper is accepted to NeurIPS 2024 D&B Track (see UltraMedical ).
Sep 20, 2024 One paper is accepted to EMNLP 2024 (see LPA).
Jul 10, 2024 One co-first author paper is accepted to COLM 2024 (see LLM4BioHypoGen).
May 16, 2024 Two papers are accepted to ACL 2024 (One first-author, see CoGenesis).
Mar 13, 2024 One paper is accepted to NAACL 2024 (see PAD).
Oct 06, 2023 One first-author paper is accepted to EMNLP 2023 (see CRaSh).

selected publications

  1. Arxiv
    NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?
    Yuru Wang, Lejun Cheng, Yuxin Zuo, +11 more authors, Ning Ding, Bowen Zhou, and Kaiyan Zhang
    Preprint, 2026
  2. ECCV 2026
    TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration
    Yisheng Zhang, Guoli Jia, Haote Hu, +9 more authors, Kaiyan Zhang, and Bowen Zhou
    European Conference on Computer Vision (ECCV), 2026
  3. Arxiv
    EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions
    Jincheng Zhong, Weizhi Wang, Che Jiang, +4 more authors, and Kaiyan Zhang
    Preprint, 2026
  4. Arxiv
    Self-Improving Agents in the Era of Experience: A Survey of Self- to Meta-Evolution
    Che Jiang, Jincheng Zhong, Yu Fu, +21 more authors, Ning DingKaiyan Zhang, and Bowen Zhou
    Preprint, 2026
  5. Arxiv
    MARTI-MARS2: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation
    Shijie Wang*, Pengfei Li*, Yikun Fu*, Kaifeng Liu, Fangyuan Li, Yang Liu, +10 more authors, Bowen ZhouKaiyan Zhang, and Biqing Qi
    Preprint, 2026
  6. Arxiv
    A Survey of Reinforcement Learning for Large Reasoning Models
    Kaiyan Zhang*†, Yuxin Zuo*†, Bingxiang He*, Youbang Sun*, Runze Liu*, Che Jiang*, Yuchen Fan*, Kai Tian*, Guoli Jia*, Pengfei Li*, and 29 more authors
    Preprint, 2025
  7. EMNLP 2025
    ReviewRL: Towards Automated Scientific Review with RL
    Sihang Zeng*, Kai Tian*Kaiyan Zhang*, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, and 1 more author
    The 2025 Conference on Empirical Methods in Natural Language Processing, 2025
  8. Arxiv
    SSRL: Self-Search Reinforcement Learning
    Yuchen Fan*Kaiyan Zhang*†, Heng Zhou*, Yuxin Zuo, Yanxu Chen, Yu Fu, Xinwei Long, Xuekai Zhu, Che Jiang, Yuchen Zhang, and 8 more authors
    Preprint, 2025
  9. ICLR 2026
    MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference
    Kaiyan Zhang*†, Runze Liu*, Xuekai Zhu*, Kai Tian*, Sihang Zeng*, Guoli Jia*, Yuchen Fan*, Xingtai Lv*, Yuxin Zuo*, Che Jiang*, and 16 more authors
    The Fourteenth International Conference on Learning Representations, 2026
  10. NeurIPS 2025
    TTRL: Test-Time Reinforcement Learning
    Yuxin Zuo*Kaiyan Zhang*†, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui, Ning Ding, and Bowen Zhou
    The Thirty-Ninth Annual Conference on Neural Information Processing Systems, 2025
  11. ICLR 2025
    OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees
    Kaiyan Zhang, Jiayuan Zhang, Haoxin Li, Xuekai Zhu, Ermo Hua, Xingtai Lv, Ning Ding, Biqing Qi, and Bowen Zhou
    The Thirteenth International Conference on Learning Representations, 2025
  12. Arxiv
    Towards Building Specialized Generalist AI with System 1 and System 2 Fusion
    Kaiyan Zhang*, Biqing Qi*, and Bowen Zhou
    Preprint, 2024
  13. ICML@MAS 2025
    Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding
    Kaiyan Zhang*, Jianyu Wang*, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, and Bowen Zhou
    ICML 2025 Workshop on MAS, 2025
  14. NeurIPS 2024
    Ultramedical: Building specialized generalists in biomedicine
    Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, and 1 more author
    The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024
  15. ACL 2024
    CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
    Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, and Bowen Zhou
    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
  16. COLM 2024
    Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation
    Biqing Qi*Kaiyan Zhang*, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, and Bowen Zhou
    First Conference on Language Modeling, 2024
  17. EMNLP 2023
    CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model
    Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, and Bowen Zhou
    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023