Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
This is a page not in th emain menu
Reinforcement Learning (RL) approaches can be broadly casted in three categories--- **value** based, **model** based, and **policy** based. Value based methods tend to model the optimal value function and then extract the optimal policy. Model based methods try to learn the model (transition and reward dynamics) and then extract the optimal policy using planning techniques. Intead, policy based methods try to learn the optimal policy by directly optimizing the objective funtion of inerest, i.e. the expected discounted return. Policy optimization falls under this third category and includes popular off-the-shelf RL methods such as TRPO, PPO, and SAC.
This post is written to gather a better understanding of recent work done in eigen option discovery using successor representation. I try to list out most of the major ideas building upto eigen option discovery and show results obtained on simple gridworld tasks. I start by introducing proto value functions and move onto eigen option discovery and how the successor representation (SR) comes into play.