Partially Observable MDP

A Markov Decision Process together with a Sensor Model $O$ that has the Sensor Markov Property and is stationary, which means $O (s, e) = P (e ∣ s) .$ This models a partially observable and stochastic environment. As the Agent does not know which state it is in, it makes no sense to talk about policies that map concrete states to Actions. Thus we introduce the Optimal Policy as a function $π^{*} (b)$ that maps belief states to actions instead.

Now the idea is to convert a POMDP into an MDP in Belief State space. Which means we want to have a Transition Model on the belief states and a reward function on the belief states.

Filtering at the Belief State Level

$b^{'} (s^{'}) = FORWARD (b, a, e) = α \cdot P (e ∣ s^{'}) \cdot (\sum_{s} P (s^{'} ∣ s, a) \cdot b (s))$

Overview Formula

Transition Model Sensor Model Belief State

POMDP Decision Cycle

Given current Belief State $b$ , execute Action $a = π^{*} (b)$
Receive Percept $e$
Update Belief State with $FORWARD (b, a, e)$

Reducing POMDPs to Belief State MDPs

Transition Model at the Belief State level

P\left(b^{\prime} \mid b, a\right) & =P\left(b^{\prime} \mid a, b\right)=\sum_e P\left(b^{\prime} \mid e, a, b\right) \cdot P(e \mid a, b) \\ & =\sum_e P\left(b^{\prime} \mid e, a, b\right) \cdot\left(\sum_{s^{\prime}} P\left(e \mid s^{\prime}\right) \cdot\left(\sum_s P\left(s^{\prime} \mid s, a\right), b(s)\right)\right) \end{aligned}$$ where $P\left(b^{\prime} \mid e, a, b\right)$ is $1$ if $b^{\prime}=\operatorname{FORWARD}(b, a, e)$ and 0 otherwise. Reward function at the [[Belief State]] level. $$\rho(b):=\sum_s b(s) \cdot R(s)$$ The expected reward for the states the [[Rational Agent|Agent]] might be in. Now we reduced the POMDP to a [[Markov Decision Process]] with a [[Übergangsmatrix|Transition Model]] $$P\left(b^{\prime} \mid b, a\right)$$ and a reward function $$\rho(b).$$ This is actually observable now as the [[Belief State]] state is **always** observable. An [[Optimal Policy]] on this MDP is also an optimal policy on the POMDP.

Marcs Notes

Explorer

Partially Observable MDP

Partially Observable MDP

Filtering at the Belief State Level

Reducing POMDPs to Belief State MDPs

Graphansicht

Inhaltsverzeichnis

Backlinks