publications | Kaiyan Zhang (张开颜)

Full list in Google Scholar.

2026

Arxiv

MARTI-MARS2: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation

Shijie Wang^*, Pengfei Li^*, Yikun Fu^*, Kaifeng Liu, Fangyuan Li, Yang Liu, +10 more authors, Bowen Zhou^†, Kaiyan Zhang^†, and Biqing Qi^†

Preprint, 2026

PDF Code
ICLR 2026

MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference

Kaiyan Zhang^*†, Runze Liu^*, Xuekai Zhu^*, Kai Tian^*, Sihang Zeng^*, Guoli Jia^*, Yuchen Fan^*, Xingtai Lv^*, Yuxin Zuo^*, Che Jiang^*, and 16 more authors

The Fourteenth International Conference on Learning Representations, 2026

PDF Code

2025

Arxiv

A Survey of Reinforcement Learning for Large Reasoning Models

Kaiyan Zhang^*†, Yuxin Zuo^*†, Bingxiang He^*, Youbang Sun^*, Runze Liu^*, Che Jiang^*, Yuchen Fan^*, Kai Tian^*, Guoli Jia^*, Pengfei Li^*, and 29 more authors

Preprint, 2025

PDF Code
EMNLP 2025

ReviewRL: Towards Automated Scientific Review with RL

Sihang Zeng^*, Kai Tian^*, Kaiyan Zhang^*, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, and 1 more author

The 2025 Conference on Empirical Methods in Natural Language Processing, 2025

PDF Code
Arxiv

SSRL: Self-Search Reinforcement Learning

Yuchen Fan^*, Kaiyan Zhang^*†, Heng Zhou^*, Yuxin Zuo, Yanxu Chen, Yu Fu, Xinwei Long, Xuekai Zhu, Che Jiang, Yuchen Zhang, and 8 more authors

Preprint, 2025

PDF Code
NeurIPS 2025

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo^*, Kaiyan Zhang^*†, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui^†, Ning Ding, and Bowen Zhou

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, 2025

PDF Code
ICLR 2025

OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees

Kaiyan Zhang, Jiayuan Zhang, Haoxin Li, Xuekai Zhu, Ermo Hua, Xingtai Lv, Ning Ding, Biqing Qi, and Bowen Zhou

The Thirteenth International Conference on Learning Representations, 2025

PDF
AAAI

Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines

Xinwei Long, Zhiyuan Ma, Ermo Hua, Kaiyan Zhang, Biqing Qi, and Bowen Zhou

In , 2025

PDF
ICML@MAS 2025

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

Kaiyan Zhang^*, Jianyu Wang^*, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, and Bowen Zhou

ICML 2025 Workshop on MAS, 2025

PDF Code

2024

Arxiv

Towards Building Specialized Generalist AI with System 1 and System 2 Fusion

Kaiyan Zhang^*, Biqing Qi^*, and Bowen Zhou

Preprint, 2024

PDF
NeurIPS 2024

Ultramedical: Building specialized generalists in biomedicine

Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, and 1 more author

The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024

PDF Code
ACL 2024

CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following

Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, and Bowen Zhou

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

PDF Code
COLM 2024

Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation

Biqing Qi^*, Kaiyan Zhang^*, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, and Bowen Zhou

First Conference on Language Modeling, 2024

PDF Code
Arxiv

Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process

Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, and Bowen Zhou

Preprint, 2024
ACL 2024 findings

SMR: State Memory Replay for Long Sequence Modeling

Biqing Qi, Junqi Gao, Kaiyan Zhang, Dong Li, Jianxing Liu, Ligang Wu, and Bowen Zhou

Findings of the Association for Computational Linguistics ACL 2024, 2024
Arxiv

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

Biqing Qi, Pengfei Li, Fangyuan Li, Junqi Gao, Kaiyan Zhang, and Bowen Zhou

Preprint, 2024
AAAI 2024

Generative Multi-Modal Knowledge Retrieval with Large Language Models

Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, Kaiyan Zhang, Bowen Zhou, and Jie Zhou

The 38th Annual AAAI Conference on Artificial Intelligence, 2024

2023

EMNLP 2023

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, and Bowen Zhou

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

PDF Code
NAACL 2024

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning

Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xinwei Long, Zhouhan Lin, and Bowen Zhou

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2023
ACM TOIS

A static and dynamic attention framework for multi turn dialogue generation

Weinan Zhang, Yiming Cui, Kaiyan Zhang, Yifa Wang, Qingfu Zhu, Lingzhi Li, and Ting Liu

ACM Transactions on Information Systems, 2023
ACM TOIS

A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation

Haoyu Song, Wei-Nan Zhang, Kaiyan Zhang, and Ting Liu

ACM Transactions on Information Systems, 2023

2021

SCIENTIA

A survey of multi-party dialogue research based on deep learning

Kaiyan Zhang, Wei-Nan Zhang, and Ting Liu

SCIENTIA SINICA Informationis, 2021
ACL 2021

BoB: BERT over BERT for training persona-based dialogue models from limited personalized data

Haoyu Song, Yan Wang, Kaiyan Zhang, Wei-Nan Zhang, and Ting Liu

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021