Reinforcement Learning (RL) approaches can be broadly casted in three categories--- **value** based, **model** based, and **policy** based. Value based methods tend to model the optimal value function and then extract the optimal policy. Model based methods try to learn the model (transition and reward dynamics) and then extract the optimal policy using planning techniques. Intead, policy based methods try to learn the optimal policy by directly optimizing the objective funtion of inerest, i.e. the expected discounted return. Policy optimization falls under this third category and includes popular off-the-shelf RL methods such as TRPO, PPO, and SAC.