Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization
Xiangxiang Chu
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/paperwithcode/pop3dOfficialIn papertf★ 0
- github.com/cxxgtxy/POP3DOfficialIn papertf★ 0
Abstract
As the most successful variant and improvement for Trust Region Policy Optimization (TRPO), proximal policy optimization (PPO) has been widely applied across various domains with several advantages: efficient data utilization, easy implementation, and good parallelism. In this paper, a first-order gradient reinforcement learning algorithm called Policy Optimization with Penalized Point Probability Distance (POP3D), which is a lower bound to the square of total variance divergence is proposed as another powerful variant. Firstly, we talk about the shortcomings of several commonly used algorithms, by which our method is partly motivated. Secondly, we address to overcome these shortcomings by applying POP3D. Thirdly, we dive into its mechanism from the perspective of solution manifold. Finally, we make quantitative comparisons among several state-of-the-art algorithms based on common benchmarks. Simulation results show that POP3D is highly competitive compared with PPO. Besides, our code is released in https://github.com/paperwithcode/pop3d.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Atari 2600 Alien | POP3D | Score | 1,510.8 | — | Unverified |
| Atari 2600 Amidar | POP3D | Score | 729.15 | — | Unverified |
| Atari 2600 Assault | POP3D | Score | 5,400.13 | — | Unverified |
| Atari 2600 Asterix | POP3D | Score | 4,310.67 | — | Unverified |
| Atari 2600 Asteroids | POP3D | Score | 2,488.1 | — | Unverified |
| Atari 2600 Atlantis | POP3D | Score | 2,193,605.67 | — | Unverified |
| Atari 2600 Bank Heist | POP3D | Score | 1,212.23 | — | Unverified |
| Atari 2600 Battle Zone | POP3D | Score | 15,466.67 | — | Unverified |
| Atari 2600 Beam Rider | POP3D | Score | 4,549 | — | Unverified |
| Atari 2600 Bowling | POP3D | Score | 38.99 | — | Unverified |
| Atari 2600 Boxing | POP3D | Score | 97.23 | — | Unverified |
| Atari 2600 Breakout | POP3D | Score | 458.41 | — | Unverified |
| Atari 2600 Centipede | POP3D | Score | 3,315.44 | — | Unverified |
| Atari 2600 Chopper Command | POP3D | Score | 6,308.33 | — | Unverified |
| Atari 2600 Crazy Climber | POP3D | Score | 120,247.33 | — | Unverified |
| Atari 2600 Demon Attack | POP3D | Score | 61,147.33 | — | Unverified |
| Atari 2600 Double Dunk | POP3D | Score | -7.89 | — | Unverified |
| Atari 2600 Enduro | POP3D | Score | 459.85 | — | Unverified |
| Atari 2600 Fishing Derby | POP3D | Score | 28.99 | — | Unverified |
| Atari 2600 Freeway | POP3D | Score | 21.21 | — | Unverified |
| Atari 2600 Frostbite | POP3D | Score | 316.87 | — | Unverified |
| Atari 2600 Gopher | POP3D | Score | 6,207 | — | Unverified |
| Atari 2600 Gravitar | POP3D | Score | 557.17 | — | Unverified |
| Atari 2600 Ice Hockey | POP3D | Score | -4.12 | — | Unverified |
| Atari 2600 James Bond | POP3D | Score | 358.54 | — | Unverified |
| Atari 2600 Kangaroo | POP3D | Score | 3,891.67 | — | Unverified |
| Atari 2600 Krull | POP3D | Score | 7,715.68 | — | Unverified |
| Atari 2600 Kung-Fu Master | POP3D | Score | 33,728 | — | Unverified |
| Atari 2600 Montezuma's Revenge | POP3D | Score | 0 | — | Unverified |
| Atari 2600 Ms. Pacman | POP3D | Score | 1,683.87 | — | Unverified |
| Atari 2600 Name This Game | POP3D | Score | 6,065.63 | — | Unverified |
| Atari 2600 Pitfall! | POP3D | Score | 0 | — | Unverified |
| Atari 2600 Pong | POP3D | Score | 20.5 | — | Unverified |
| Atari 2600 Private Eye | POP3D | Score | 79.67 | — | Unverified |
| Atari 2600 Q*Bert | POP3D | Score | 15,396.67 | — | Unverified |
| Atari 2600 River Raid | POP3D | Score | 8,052.23 | — | Unverified |
| Atari 2600 Road Runner | POP3D | Score | 44,679.67 | — | Unverified |
| Atari 2600 Robotank | POP3D | Score | 4.6 | — | Unverified |
| Atari 2600 Seaquest | POP3D | Score | 1,807.47 | — | Unverified |
| Atari 2600 Space Invaders | POP3D | Score | 1,216.15 | — | Unverified |
| Atari 2600 Star Gunner | POP3D | Score | 48,984 | — | Unverified |
| Atari 2600 Tennis | POP3D | Score | -8.32 | — | Unverified |
| Atari 2600 Time Pilot | POP3D | Score | 3,770.33 | — | Unverified |
| Atari 2600 Tutankham | POP3D | Score | 241.21 | — | Unverified |
| Atari 2600 Up and Down | POP3D | Score | 242,701.51 | — | Unverified |
| Atari 2600 Venture | POP3D | Score | 36.33 | — | Unverified |
| Atari 2600 Video Pinball | POP3D | Score | 37,780.7 | — | Unverified |
| Atari 2600 Wizard of Wor | POP3D | Score | 4,704 | — | Unverified |
| Atari 2600 Zaxxon | POP3D | Score | 9,472 | — | Unverified |