Preferences on Reward Sequences

For an MDP we want to focus on Preferences on reward sequences to get a sense of time.

We call such preferences stationary iff $[r, r_{0}, r_{1}, r_{2}, \dots] ≻ [r, r_{0}^{'}, r_{1}^{'}, r_{2}^{'}, \dots] \Leftrightarrow [r_{0}, r_{1}, r_{2}, \dots] ≻ [r_{0}^{'}, r_{1}^{'}, r_{2}^{'}, \dots]$ This means, if we remove the first reward from the sequences we still have the same preference.

For these types of preferences there are two ways to combine them over time to get a utility function for a sequence of states.

Additive $U ([s_{0}, s_{1}, s_{2}, \dots]) = R (s_{0}) + R (s_{1}) + R (s_{2}) + \dots$ Discounted $U ([s_{0}, s_{1}, s_{2}, \dots]) = R (s_{0}) + γ R (s_{1}) + γ^{2} R (s_{2}) + \dots$ with discount factor $γ .$

Marcs Notes

Explorer

Preferences on Reward Sequences

Preferences on Reward Sequences

Graphansicht

Backlinks