Policy iteration vs Value iteration

Policy iteration computes optimal value and policy
Value iteration:
- Maintain optimal value of starting in a state s if have a finite number of steps $k$ left in the episode
- Iterate to consider longer and longer episodes

Policy iteration and value iteration will converge to the same optimal policy.

Algorithm

Value function of a policy is the solution to the Bellman equation $V^{π} (s) = R^{π} (s) + γ \sum_{s^{'} \in S} P^{π} (s^{'} | s) V^{π} (s^{'})$ Bellman-backup operator is an operator that is applied to a value function and returns a new value function. The Bellman-backup operator improves the value if it is possible $ℬ V (s) = \max_{a} R (s, a) + γ \sum_{s^{'} \in S} P^{π} (s^{'} | s, a) V (s^{'})$ $ℬ V$ yields a value function over all states $s$ .

Reinforcement Learning/Value Iteration

Policy iteration vs Value iteration

Algorithm

Navigation menu

Reinforcement Learning/Value Iteration

Policy iteration vs Value iteration

Algorithm

Navigation menu

Search