Preferences on Reward Sequences

For an MDP we want to focus on Preferences on reward sequences to get a sense of time.

We call such preferences stationary iff This means, if we remove the first reward from the sequences we still have the same preference.

For these types of preferences there are two ways to combine them over time to get a utility function for a sequence of states.

Additive Discounted with discount factor