Balasuntharam, Tamilselvan (2023-12-01)
The Proximal Policy Optimization (PPO), a policy gradient method, excels in reinforcement learning with its ”surrogate” objective function and stochastic gradient ascent. However, PPO does not fully consider the significance ...