Kaiyan Zhang (张开颜)
PhD Candidate, Tsinghua University
I am a third-year PhD student at the Department of Electronic Engineering, Tsinghua University, under the guidance of Professor Bowen Zhou. I previously earned a Master’s degree in Computer Science and Technology in 2022 from the Harbin Institute of Technology (HIT), where I was supervised by Weinan Zhang and Ting Liu in the HIT-SCIR lab.
My research centers on the alignment and collaboration of large language models (LLMs), with the broader goal of building scalable collaborative intelligence systems. I am currently developing an LLM-based multi-agent reinforcement learning framework to enhance reasoning capabilities beyond foundational levels (R1 and O1). My work also explores how multiple agents can effectively collaborate on real-world, agentic tasks.
I am open to collaborations and discussions across related areas—such as multi-agent (COLM 2024, ACL 2024, Arxiv 2406), reinforcement learning (Arxiv 2412 - ImplicitPRM, Arxiv 2502 - PRIME), test-time scaling (ICLR 2025 - OpenPRM, Arxiv 2502, Arxiv 2503 - Video-T1, Arxiv 2504 - GenPRM), test-time reinforcement learning (Arxiv 2504 - TTRL), and multi-agent reinforcement learning (GitHub 2025 - MARTI). I’d be happy to connect with researchers who share these interests.
news
May 27, 2025 | We are very excited to release MARTI: A framework for LLM-based Multi-Agent Reinforced Training and Inference. (see MARTI |
---|---|
May 16, 2025 | Two papers are accepted to ACL 2025 Main, congrats to the collaborators. |
May 14, 2025 | Just shared our latest work on TTS, RL and TTRL at QingkeTalk. |
May 02, 2025 | Four papers are accepted to ICML 2025, congrats to the collaborators. |
Apr 23, 2025 | We release Test-time Reinforcement Learning (TTRL), which investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in LLMs. (see TTRL |
Mar 31, 2025 | We release collections of RL recipes (see Awesome-RL-Reasoning-Recipes |
Mar 24, 2025 | Video-T1 is released, which firstly evaluate TTS on video generation (see Video-T1 |
Feb 10, 2025 | We explore compute-optimal test-time scaling (see compute-optimal-tts |
Jan 23, 2025 | One first-author paper is accepted to ICLR 2025 (see OpenPRM). |
Dec 24, 2024 | One paper is accepted to AAAI 2025 (Congrats to Xinwei). |
Sep 27, 2024 | One first-author paper is accepted to NeurIPS 2024 D&B Track (see UltraMedical |
Sep 20, 2024 | One paper is accepted to EMNLP 2024 (see LPA). |
Jul 10, 2024 | One co-first author paper is accepted to COLM 2024 (see LLM4BioHypoGen). |
May 16, 2024 | Two papers are accepted to ACL 2024 (One first-author, see CoGenesis). |
Mar 13, 2024 | One paper is accepted to NAACL 2024 (see PAD). |
Oct 06, 2023 | One first-author paper is accepted to EMNLP 2023 (see CRaSh). |
selected publications
- GitHub
- ICLR 2025OpenPRM: Building Open-domain Process-based Reward Models with Preference TreesThe Thirteenth International Conference on Learning Representations, 2025
- Arxiv