With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL.
We present CP3ER (Consistency Policy with Prioritized Proximal Experience Regularization), a effective algorithm that significantly enhances the stability and performance of visual reinforcement learning models.
framework of CP3ER
We assessed our algorithm, CP3ER, on hard tasks from the DeepMind Control Suite. The figures below demonstrate its robust performance.
Dog Stand
Dog Trot
Humanoid Walk
Humanoid Run
Manipulator Bring Ball
Acrobot Swingup Sparse
We also evaluate CP3ER in the MetaWorld environment, where it achieved notable performance.
assembly
disassemble
hammer
hand-insert
pick-place-wall
stick-pull
# arxiv version
@misc{li2024generalizingconsistencypolicyvisual,
title={Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization},
author={Haoran Li and Zhennan Jiang and Yuhui Chen and Dongbin Zhao},
year={2024},
eprint={2410.00051},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.00051},
}
# NeurIPS version
@inproceedings{
li2024generalizing,
title={Generalizing Consistency Policy to Visual {RL} with Prioritized Proximal Experience Regularization},
author={Haoran Li and Zhennan Jiang and YUHUI CHEN and Dongbin Zhao},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=MOFwt8OeXr}
}