Search
Now showing items 1-1 of 1
Preferential proximal policy optimization in reinforcement learning
(2023-12-01)
The Proximal Policy Optimization (PPO), a policy gradient method, excels in reinforcement learning with its ”surrogate” objective function and stochastic gradient ascent. However, PPO does not fully consider the significance ...