Chapter Summary

6.7. Chapter Summary#

Splash image with steampunk autonomous car

In Chapter 5, we turned our attention from probabilistic aspects of robotics to deterministic aspects of the problems of perception and planning. With respect to modeling, this mainly amounted to introducing the geometric aspects of motion and imaging. For reasoning, we introduced neural networks for computer vision tasks, and sampling-based algorithms for path planning.

In this chapter, we generalized geometric motion models, introduced new geometric sensing models, combined geometry and probabilistic methods to solve the SLAM problem, developed motion planning algorithms for cars, and extended deep learning to incorporate reinforcement learning.

To start our exploration into autonomous driving we introduced a configuration space that included both position and orientation information for a mobile robot moving in the plane. We did this by simply extending our coordinate representation of configuration from \(q = (x,y)\) for robots that translate in the plane, to \(q = (x,y,\theta)\), where \(\theta\) denotes the robot’s orientation. In a brief aside, we showed how to compute the position of any point on the robot, given its configuration. This computation essentially relied on basic trigonometry, and it was not immediately obvious how we might generalize our computations to robots that move freely in 3D. In this chapter, we solved this representation problem by introducing the special special orthogonal group, \(SO(2)\), to represent rotations, and the special Euclidean group, \(SE(2)\) to represent combined translation and rotation.

The matrices in \(SO(2)\) are called rotation matrices and they have several notable properties:

  • \(R\) is orthogonal (i.e., its columns are mutually orthogonal unit vectors)

  • \(R^{-1} = R^T\),

  • \(\det R = +1\).

The matrices in \(SE(2)\) are called homogeneous transformation matrices, and they include rotation information (encoded by a rotation matrix as the upper left sub-matrix), and translation information (encoded by the rightmost column). Thus, these matrices have the form

\[\begin{split}T^0_1 = \begin{bmatrix} R_{1}^{0} & d_{1}^{0}\\ 0_{2} & 1 \end{bmatrix} \end{split}\]

where the notation \(T^0_1\) indicates that this homogeneous transformation gives the position and orientation of coordinate frame 1 with respect to coordinate frame 0, and \(0_2\) denotes the row vector \([0~ 0]\).

Homogeneous transformation matrices can be used to compute coordinate transformations between various coordinate systems. For example, if we are given the coordinates of a point \(P\) with respect to frame 1 and we desire the coordinate representation with respect to frame 0, this can be determined using the matrix equation:

\[\begin{split} \begin{bmatrix} P^0 \\ 1 \end{bmatrix} = \begin{bmatrix} R_{1}^{0} & d_{1}^{0}\\ 0_{2} & 1 \end{bmatrix} \begin{bmatrix} P^1 \\ 1 \end{bmatrix} \end{split}\]

Finally, composition of homogeneous transformations requires nothing more than simple matrix multiplication. Given the transformations \(T^0_1\) and \(T^1_2\) (which denote the relative position and orientation of frame 1 with respect to frame 0, and of frame 2 with respect to frame 1, respectively), the position and orientation of frame 2 with respect to frame 0 is given by \(T^0_2 = T^0_1 T^1_2\).

In this chapter, we developed the differential kinematics for a car-like robot. Previously we derived the relationship between wheel angular velocity and the resulting velocity (linear and angular) for a differential drive robot. For car-like systems, we prefer to compute the linear and angular velocities of the robot with respect to the world coordinate frame as a function of the rate of change in the steering angle and of the robot’s linear velocity (expressed in the body-attached frame). This is a more natural choice than using wheel speed, since the control input for a car-like robot is often specified as a rate of wheel rotation and a forward speed.

In this chapter we introduced LIDAR, an active sensor that constructs 3D point clouds in real time. In Chapter 5 we introduced computer vision, and showed how stereo computer vision could be used to derive 3D coordinates for points in the scene. While computer vision has shown dramatic performance improvements in recent years, stereo vision is not reliable enough, fast enough, or dense enough for applications such as self-driving cars. In contrast, LIDAR uses laser light and time of flight computation to determine the distance to each point that is visible in the scene. Because LIDAR data is typically collected as the sensor is moving through the environment, it is necessary to map 3D point cloud data into a common reference coordinate frame. Happily, homogeneous transformations are the perfect way to accomplish this. LIDAR is currently the most popular, and most reliable, sensor being used for self-driving cars.