Blog posts

2020

Mirror Descent Policy Optimization

3 minute read

Published:

Reinforcement Learning (RL) approaches can be broadly casted in three categories--- **value** based, **model** based, and **policy** based. Value based methods tend to model the optimal value function and then extract the optimal policy. Model based methods try to learn the model (transition and reward dynamics) and then extract the optimal policy using planning techniques. Intead, policy based methods try to learn the optimal policy by directly optimizing the objective funtion of inerest, i.e. the expected discounted return. Policy optimization falls under this third category and includes popular off-the-shelf RL methods such as TRPO, PPO, and SAC.

2018

Successor Representation and Eigen Options

11 minute read

Published:

This post is written to gather a better understanding of recent work done in eigen option discovery using successor representation. I try to list out most of the major ideas building upto eigen option discovery and show results obtained on simple gridworld tasks. I start by introducing proto value functions and move onto eigen option discovery and how the successor representation (SR) comes into play.