4.7. Chapter Summary#

Splash image with cute robot with a stalked eye

In the previous chapters, we have mainly considered discrete probability distributions. In Chapter 2, we modeled the world state using five discrete categories of trash, and in Chapter 3 we modeled the world as five discrete rooms. In this chapter, we began a serious study of continuous random variables, first to represent the robot’s state, and then to represent sensor readings. Representing and reasoning about continuous probability distributions is more involved than working in discrete domains, and we introduced a set of representational and inference tools that were able to scale to these more difficult problems.

4.7.1. Models#

In this chapter, we represented the state of the robot using continuous coordinates, \(x \in \mathbb{R}^2\), and we modeled uncertainty in state using a Gaussian distribution. Gaussian distributions have several nice properties. First, they are completely characterized by two parameters, a mean vector, \(\mu \in \mathbb{R}^n\), and a covariance matrix, \(\Sigma \in \mathbb{R}^{n \times n}\). In the one-dimensional case, these are scalars, denoted by \(\mu\) and \(\sigma\). Perhaps more importantly, Gaussian distributions enjoy the privilege of being very good approximations for many stochastic aspects of real-world systems. Roboticists, and engineers in general, often resort to the assumption that noise, disturbances, or other stochastic aspects of real-world systems can be accurately approximated using Gaussian distributions.

To model uncertainty in the motion model, we introduced the conditional Gaussian pdf. In particular, we assumed that noise in the motion model could be modeled as additive Gaussian noise, so that the state at time \(k+1\) is defined as

\[x_{k+1} = x_k + u_k + w_k\]

in which \(x_k\) is the state at time \(k\), \(w_k\) is the random disturbance, and \(u_k\) is the commanded motion at time \(k\). Under our Gaussian assumption that \(w_k \sim N(\mu,\Sigma)\), the probability distribution for the state at time \(k+1\) is given by the conditional Gaussian pdf

\[ p(x_{k+1}|x_{k}, u_k) = \mathcal{N}(x_{k_1}; x_{k} + u_k, \Sigma) \]

Thus, by assuming Gaussian noise in the motion model, we arrive to a kind of “Gaussian in/Gaussian out” formulation, which can greatly simplify certain inference problems (e.g., by using the Kalman filter).

We can also use continuous conditional pdf’s to model sensors. For example, if the ideal (i.e., noise-free) sensor reading at time \(k\) is defined by a function \(h(x_k)\), we can model the sensor output by the random variable

\[z_k = h(x_k) + w_k\]

in which \(w_k\) is the noise term (unrelated to the noise in our motion model). If \(w_k\) is a Gaussian random variable, the conditional distribution for sensor measurement given the value of \(h(x_k)\) is given by

\[\begin{split} \begin{aligned} p(z_k|x_k) &= \mathcal{N}(z_k;\mu=h(x_k), \sigma^2) \\ &= \frac{1}{\sqrt{2\pi\sigma^2}} \exp\{-\frac{1}{2\sigma^2}(z_k-h(x_k))^2\} \end{aligned} \end{split}\]

This approach generalizes nicely to the case of multi-dimensional sensors, as we saw for the case of GPS-like sensors with Gaussian noise.

There are, of course, many problems for which the uncertainty cannot be adequately modeled using Gaussian distributions. For example, a Gaussian distribution, which has a single mode, cannot adequately model a multi-model distribution. A classic example of this situation is a robot in a long hallway that senses an office door; the robot has a strong belief that it is in front of a door, but no way to know which door. This situation corresponds to a probability distribution with modes at locations that are in front of office doors. In this chapter, we saw two ways to represent complex probability distributions: grids and samples. In the case of grids, we merely decompose the state space into a grid, and assign to each grid cell a value that corresponds to the probability that the state lies in that cell. In the case of samples, the situation is less structured. Instead of a uniform grid, sampling-based approaches represent the probability distribution by a collection of weighted samples (also called particles). The value of the sample specifies a state, and the weight approximates the probability mass associated to a local neighborhood of the sample. While grid-based representations grow exponentially with the dimension of the state space, sampling-based approaches are much more efficient, but require the availability of methods that can generate good sets of samples.

Finally, in addition to introducing these methods for dealing with uncertainty, we also developed a simple geometric model for wheeled robot locomotion, specifically for the case of robots with omni-wheels. In particular, we developed the differential relationships between the rotation of the robot’s wheels, and the instantaneous velocity of the robot. There is, of course, uncertainty associated to this motion model; however, rather than explicitly consider this uncertainty, we merely bundled up all of the uncertainties associated with robot motion into the noise parameter \(w_k\). This simplification leads to efficient computation, but it also has a fairly firm theoretical basis in the Central Limit Theorem, a well-known theorem from probability that essentially ensures that the aggregate of many sources of uncertainty can be well-characterized using a Gaussian distribution (there are, of course, many caveats and conditions, which we will not consider here).

4.7.2. Reasoning#

In Chapters 2 and 3, we introduced Bayes Theorem and controlled Markov chains. In this chapter, we combined these two ideas to develop the Bayesian filter, and showed how it can be used to estimate the state of a robot that collects sensor data while executing motion commands under uncertain conditions. At each time step, the Bayesian filter can be understood as a two-stage process: in the first phase, the motion model is used to calculate a predictive distribution, and in the second phase sensor data is used to upgrade the predictive distribution to the filtering distribution.

\[\begin{split} \begin{align*} \mathbf{Prediction ~ Phase:} &~~~ P(X_{k}|\mathcal{Z}^{k-1},\mathcal{U}^{k})=\sum_{x_{k-1}}P(X_{k}|x_{k-1},u_{k-1})P(x_{k-1}|\mathcal{Z}^{k-1},\mathcal{U}^{k-1}) \\ \mathbf{Measurement ~ Phase:} &~~~ P(X_{k}| \mathcal{Z}^{k},\mathcal{U}^{k}) \propto L(X_{k};z_{k})P(X_{k}|\mathcal{Z}^{k-1},\mathcal{U}^{k}) \end{align*} \end{split}\]

These equations are obtained by applying the Markov property to effectively decouple past action and sensing histories via the marginalizing summation in the prediction phase.

While elegant and compact, in the general case (i.e., for arbitrary probability distributions), direct, exact implementation of the Bayes filter is untenable. In this chapter, we introduced three ways to deal with this fact: Markov localization, Monte Carlo localization using particle filtering, and Kalman filtering. The former two are approximation schemes that are applicable for arbitrary probability distributions, while the latter is an exact method that is applicable only for linear systems under Gaussian uncertainty.

Markov localization approximates the robot’s state space using a grid representation, and assigning to each grid cell an approximation of the probability that the robot’s state lies in that cell. At each time step, the filter updates the probabilities assigned to every cell in the grid. In practice, it is possible to ignore cells whose probability values are sufficiently small, and this can significantly improve the run-time performance of the approach.

Monte Carlo localization uses a set of weighted particles to approximate the probability distribution, hence the name particle filtering. Instead of evaluating the prediction equation in the Bayes filter, particle filtering generates a set of samples from the predictive distribution

\[ \{ x_k^{(t)} \}_{t = 1 \dots S} \sim \sum_s w_{k-1}^{(s)} P(x_{k}|x_{k-1}^{(s)},u_{k-1})P(x_{k-1}^{(s)}|\mathcal{Z}^{k-1},\mathcal{U}^{k-1}). \]

and then weights these samples according to the measurement likelihood, yielding an approximation to the filtering distribution:

\[ p(X_{k}|\mathcal{Z}^{k},\mathcal{U}^{k}) \approx \{(x_k^{(t)}, L(x_k^{(t)};z_{k}))\}_{t = 1 \dots S}. \]

In contrast to Markov and Monte Carlo localization, The Kalman filter is an exact method that can be applied in the special case of linear motion and measurement models under Gaussian uncertainty. While these conditions may seem to be severely limiting, they are satisfied (or approximately satisfied) by many robotic systems. Furthermore, linearization methods can often be used to derive local, linear approximations to nonlinear systems. In this chapter, we did not provide a detailed derivation of the Kalman filtering equations, preferring instead to demonstrate how the Kalman filter can be implemented as the solution to a least-squares problem using Factor graphs.

Equipped with several solutions to the localization problem, we showed have value iteration could be used to construct a value function to solve navigation problems. The approach that we presented is closely aligned with Markov localization, in that the state space is represented by a grid, and each grid cell is assigned a value that approximates the value function for that cell. Value iteration proceeds by updating the entire grid at each iteration, in much the same style as Markov localization.

Throughout this chapter, we used conditional Gaussian distributions to represent motion and measurement models. As we have seen in previous chapters, Gaussian distributions are parameterized by a mean vector and covariance matrix. In this chapter, we developed methods to estimate these parameters for the case of linear systems.

We began by developing a maximum likelihood parameter estimation approach for a linear, scalar measurement model, \(z_k = h(x_k) + n_k = C x_k + n_k\). The result was what we might have expected based on elementary statistics:

\[\begin{split} \begin{align*} \hat{\mu} &= \frac{1}{N} \sum_i z_i \\ \hat{C} &= \frac{\sum_k x_k z_k}{\sum_k x_k^2}\\ \widehat{\sigma^2} &= \frac{1}{N-1} \sum_k (\hat{C} x_k - z_k)^2 \end{align*} \end{split}\]

We then extended these ideas to develop a general method for multivariate linear regression in the case of a linear measurement model with zero-mean Gaussian noise, and vector-valued measurements \(z_k \in \mathbb{R}^m\). In this case, we represent the measurement model as \(z_k = H x_k + n_k\), and we assume that \(n_k\) is zero-mean Gaussian with covariance matrix \(R = \sigma^2 I\). the estimates for \(H\) and \(\sigma^2\) are given by

\[\begin{split} \begin{align*} \hat{\mu} &= 0\\ \hat{H}^T &= (\sum_k x_k x_k^T)^{-1} \sum_k x_k z_k^T\\ \widehat{\sigma^2} &= \frac{1}{(m N-m n)} \sum_k \sum_i |\hat{H}_i x_k - z_{ki}|^2 \end{align*} \end{split}\]

in which \(N\) is the number of data points. While the derivation for these equations is a bit more complex (these are, after all, matrix equations), the similarity between the scalar maximum likelihood estimate and the general multivariate linear regression estimates are conspicuous, and furthermore, the extension to motion model estimation was straightforward.

We concluded our tour of parameter estimation methods by briefly introducing the EM algorithm. The Expectation Maximization algorithm is appropriate when model parameters must be estimated using uncertain state estimates. In such cases, we can alternate between optimizing our estimate of state using the current parameter estimate and optimizing the parameter estimation using the current state estimate. While the derivations (and proof of convergence) lie beyond the scope of this book, the basic idea is intuitive.

4.7.3. Background and History#

A great reference for the probability-based localization methods discussed in this chapter is the book on Probabilistic Robotics [Thrun et al., 2005], which also discusses Markov Localization in detail. Monte Carlo Localization was introduced at the 1999 ICRA conference by one of us, in [Dellaert et al., 1999].