About me
I’m a postgraduate student at CSE, the Hong Kong University of Science and Technology, supervised by Prof. Tong Zhang. I received my master’s and bachelor’s degrees from the Department of Automation at Tsinghua University. I am furtunate to work closely with Prof. Chongjie Zhang (Washington University in St. Louis), Dr. Lei Han (Tencent AI Lab), and Prof. Meng Fang (University of Liverpool). My research interests lie in deep reinforcement learning (RL), especially goal-conditioned RL, offline RL, model-based RL, and the application of RL algorithms to Large Language Models (LLMs), and game AI.
Currently, I am actively researching ways to improve the robustness and generalization abilities of deep reinforcement learning, while also trying to enhance the trustworthiness of LLMs. Feel free to contact me by email if you are interested in discussing or collaborating with me.
News
- 🎉 (2024.5) Rewards-in-Context (RiC) is accepted by ICML 2024! Thanks to my co-authors!
- 🎉 (2024.5) GOPlan is accepted by Transactions on Machine Learning Research (TMLR)!
- 🎉 (2024.1) Robust IQL is accepted by ICLR 2024 as a spotlight paper!
Selected Publications
RL for LLMs
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs. Preprint, 2024.
Rui Yang, Ruomeng Ding, Yong Lin, Huan Zhang, Tong Zhang.
- TL;DR: Enhancing the generalization ability of reward models for LLMs via text-generation regularizations.
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment. International Conference on Machine Learning (ICML) 2024.
Rui Yang $^*$, Xiaoman Pan $^*$, Feng Luo $^*$, Shuang Qiu $^*$, Han Zhong, Dong Yu, Jianshu Chen.
- TL;DR: Efficient and scalable multi-objective alignment method for foundation models through multi-reward conditional SFT and inference-time adaption.
Robust Offline RL
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption. International Conference on Learning Representations (ICLR) 2024. $\color{red}{\text{(Spotlight)}}$
Rui Yang $^*$, Han Zhong $^*$, Jiawei Xu $^*$, Amy Zhang, Chongjie Zhang, Lei Han, Tong Zhang.
- TL;DR: SOTA robust offline RL against data corruption through robust value learning (Huber loss) and moderate pessimism (quantile Q estimators).
Corruption-Robust Offline Reinforcement Learning with General Function Approximation. Neural Information Processing Systems (NeurIPS) 2023.
Chenlu Ye $^*$, Rui Yang $^*$, Quanquan Gu, Tong Zhang.
- TL;DR: Provable robust offline RL method against reward and dynamics corruption in offline datasets through uncertainty reweighting.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing. Neural Information Processing Systems (NeurIPS) 2022. $\color{red}{\text{(Spotlight)}}$
Rui Yang $^*$, Chenjia Bai $^*$, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han.
- TL;DR: Robust offline RL method against testing-time observation perturbation through pessimism and local smoothing.
Goal-conditioend RL
What Is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?. International Conference on Machine Learning (ICML) 2023.
Rui Yang, Yong Lin, Xiaoteng Ma, Hao Hu, Chongjie Zhang, Tong Zhang.
- TL;DR: We study the unseen goal generalization ability of offline GCRL, and propose to enhance the OOD generalization.
Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL. International Conference on Learning Representations (ICLR), 2022.
Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, Chongjie Zhang.
- TL;DR: An efficient supervised-based offline GCRL method with three effective weighting techniques.
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models. Transactions on Machine Learning Research (TMLR) 2024.
Mianchu Wang $^*$, Rui Yang $^*$, Xi Chen, Hao Sun, Meng Fang, Giovanni Montana.
MHER: Model-based Hindsight Experience Replay. NeurIPS 2021 Workshop DeepRL.
Rui Yang, Meng Fang, Lei Han, Yali Du, Feng Luo, Xiu Li.
Educations
- 2022.09 - now, postgraduat student, Department of Computer Science and Engineering, the Hong Kong University of Science and Technology.
- 2019.09 - 2022.07, Master, Department of Automation, Tsinghua University.
- 2015.09 - 2019.07, Bachelor, Department of Automation, Tsinghua University.
Experiences
Internship at Tencent AI Lab
Internship at Meituan Financial Service Group
Services
Conference Reviewer: ICML (2022,2024), ICLR (2024), NeurIPS (2022,2023 $\color{red}{\text{Top Reviewer}}$), ICRA (2023), AAMAS(2024).
Journal Reviewer: IEEE Robotics and Automation Letters (RA-L), IEEE Transactions on Artificial Intelligence (TAI), Machine Learning.
Others
During my leisure time, I like sports such as running, table tennis and swimming. I used to be an amateur long-distance runner at Tsinghua University. I finished a half marathon (21.0975 km) in 1h30min and a marathon (42.195 km) in 3h36min.