Show simple item record

dc.contributor.advisorDavoudi, Kourosh
dc.contributor.advisorEbrahimi, Mehran
dc.contributor.authorBalasuntharam, Tamilselvan
dc.date.accessioned2023-12-18T19:28:29Z
dc.date.available2023-12-18T19:28:29Z
dc.date.issued2023-12-01
dc.identifier.urihttps://hdl.handle.net/10155/1706
dc.description.abstractThe Proximal Policy Optimization (PPO), a policy gradient method, excels in reinforcement learning with its ”surrogate” objective function and stochastic gradient ascent. However, PPO does not fully consider the significance of frequently encountered states in policy/value updates. To address this, this Thesis introduces Preferential Proximal Policy Optimization (P3O), which integrates the importance of these states into parameter updates. We determine state importance by multiplying the variance of action probabilities by the value function, then normalizing and smoothing this with the Exponentially Weighted Moving Average (EWMA). This calculated importance is incorporated into the surrogate objective function, redefining value and advantage estimation in PPO. Our method auto-selects state importance, which can apply to any on-policy reinforcement learning algorithm using a value function. Empirical evaluations across six Atari environments demonstrate that our approach outperforms the baseline (vanilla PPO) across different tested environments, highlighting the value of our proposed method in learning complex environments.en
dc.description.sponsorshipUniversity of Ontario Institute of Technologyen
dc.language.isoenen
dc.subjectReinforcement learningen
dc.subjectPolicy gradient methodsen
dc.subjectDeep learningen
dc.subjectPolicy optimizationen
dc.titlePreferential proximal policy optimization in reinforcement learningen
dc.typeThesisen
dc.degree.levelMaster of Science (MSc)en
dc.degree.disciplineComputer Scienceen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record