CP3ER: Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Abstract

With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL.

Overview

We present CP3ER (Consistency Policy with Prioritized Proximal Experience Regularization), a effective algorithm that significantly enhances the stability and performance of visual reinforcement learning models.

framework of CP3ER

Experiment

DeepMind Control Suite: Hard

We assessed our algorithm, CP3ER, on hard tasks from the DeepMind Control Suite. The figures below demonstrate its robust performance.

dog run

dog walk

dog trot

humanoid run

humanoid walk

humanoid stand

dog stand

Dog Stand

Dog Trot

Humanoid Walk

Humanoid Run

Manipulator Bring Ball

Acrobot Swingup Sparse

Metaworld

We also evaluate CP3ER in the MetaWorld environment, where it achieved notable performance.

assembly-v2

disassemble-v2

hammer-v2

hand-insert-v2

pick-place-wall-v2

stick-pull-v2

assembly

disassemble

hammer

hand-insert

pick-place-wall

stick-pull

BibTeX


        # arxiv version
        @misc{li2024generalizingconsistencypolicyvisual,
          title={Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization}, 
          author={Haoran Li and Zhennan Jiang and Yuhui Chen and Dongbin Zhao},
          year={2024},
          eprint={2410.00051},
          archivePrefix={arXiv},
          primaryClass={cs.LG},
          url={https://arxiv.org/abs/2410.00051}, 
      }

        # NeurIPS version
        @inproceedings{
            li2024generalizing,
            title={Generalizing Consistency Policy to Visual {RL} with Prioritized Proximal Experience Regularization},
            author={Haoran Li and Zhennan Jiang and YUHUI CHEN and Dongbin Zhao},
            booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
            year={2024},
            url={https://openreview.net/forum?id=MOFwt8OeXr}
          }