I have posted all the question within one PDF.file. For example:
The robot is attempting to find its way in this environment to the “home square” located in the
upper left-hand corner. There are two shaded squares that may or may not contain barriers that the
robot is not allowed to pass through. Initially, the robot does not know whether or not the barriers
are actually present. So part of its control stategy is to learn about the presence or absence of these
barriers. The other part, of course, is to move toward Home.
At the beginning, the robot knows that there is a probability of 0.4 that a barrier exists in the square
located at x1 = 1, x2 = 3 and that there is a probability of 0.5 that a barrier exists at x1 = 2, x2 = 2.
The robot can always see one move ahead; that is, if the robot is within one move of a barrier
location, it can always determine with certainty whether or not a barrier is there. For a price of 0.3
moves, the robot can make an observation of all the squares that are 2 moves away, where a move
is defined to be either one horizontal or one vertical square away from the robot’s current location.
In other words, the robot can move or observe diagonally only in 2 moves. The robot’s objective is
to get to the Home square while minimizing the expected value of the sum of actual moves and
penalties for observation.
Work o1ut the optimal control policy for the robot, assuming that at the beginning the robot finds
itself at the co-ordinates x1 = 1, x2 = 1


0 comments