Wenkai Fang (方文凯)

Ph.D. candidate

The College of Computer Science and Technology, Zhejiang University
Visual Intelligence and Pattern Analysis (VIPA) Group
Zhejiang, China, 310000

Email: wenkfang at zju dot edu dot cn

Biography

I am currently a first-year Ph.D. candidate in the College of Computer Science and Technology at Zhejiang University and a member of VIPA Group, supervised by Prof. Mingli Song. In 2020, I received my B.Eng. degree in Agricultural Engineering from Zhejiang University and was selected as an Outstanding Graduate of Zhejiang Province.

My research focuses on leveraging reinforcement learning to enhance the reasoning capabilities of large language models (LLMs) and on building general-purpose intelligent agents. I aim to push forward the real-world deployment of LLM-powered agents—enabling them to serve society, improve daily life, and contribute to social good.

Please feel free to contact me if you are interested in my research :)

News

Preprints

For the most up-to-date list, please visit my Google Scholar.

SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou, Kongcheng Zhang, Tongya Zheng, Kaixuan Chen, Mingli Song, Dacheng Tao
arXiv preprint arXiv:2505.20347, 2025
[arXiv] [Code]
A Survey of Direct Preference Optimization
Shunyu Liu, Wenkai Fang, Zetian Hu, Junjie Zhang, Yang Zhou, Kongcheng Zhang, Rongcheng Tu, Ting-En Lin, Fei Huang, Mingli Song, Yongbin Li, Dacheng Tao
arXiv preprint arXiv:2503.11701, 2025
[arXiv] [Code]
Reasoning with Reinforced Functional Token Tuning
Kongcheng Zhang, Qi Yao, Baisheng Lai, Jiaxing Huang, Wenkai Fang, Dacheng Tao, Mingli Song, Shunyu Liu
arXiv preprint arXiv:2502.13389, 2025
[arXiv] [Code]

Publications

* denotes equal contribution, and denotes the corresponding author.

2025

SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou, Kongcheng Zhang, Tongya Zheng, Kaixuan Chen, Mingli Song, Dacheng Tao
Advances in Neural Information Processing Systems (NeurIPS), 2025
[arXiv] [Code]
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu, Xibin Wu, Wei Shen, Jason Klein Liu, Weixun Wang, Songlin Jiang, Haoran Wang, Hao Chen, Bin Chen, Wenkai Fang, Xianyu, Yu Cao, Haotian Xu, Yiming Liu
Empirical Methods in Natural Language Processing (EMNLP), 2025
[arXiv] [Code]
Odyssey: Empowering Minecraft Agents with Open-World Skills
Shunyu Liu*, Yaoru Li*, Kongcheng Zhang*, Zhenyu Cui*, Wenkai Fang*, Yuxuan Zheng, Tongya Zheng, Mingli Song
International Joint Conference on Artificial Intelligence (IJCAI), 2025
[arXiv] [Code]

Honors

Awards

Competition

Last updated on Jan 2026. Webpage template borrowed from Prof. Sida Peng.