Open In Colab

6.1. Planar Geometry#

Linear algebra can be used to deal efficiently with rotation and translations in the plane.

Splash image with steampunk autonomous car

In the previous chapter, we saw that the configuration space for a differential drive robot can be represented as \({\cal Q} = \mathbb{R}^2 \times [0, 2\pi),\) and we used \(q = (x,y,\theta)\) to parameterize this configuration space. This choice of parameterization has the nice property of being minimal: \({\cal Q}\) is a three-dimensional object, and therefore at least three parameters are required to represent arbitrary configurations. Unfortunately, this choice of parameterization doesn’t help much when reasoning about the geometry of the robot and its motion. Recall, for example, the ad hoc planar geometry we used to derive coordinates of points on the DDR as a function of \(x,y,\theta\). In this section, we use tools from linear algebra to represent and manipulate configurations. We begin by focusing our attention on representing orientation. We then extend our methods to include translation, and develop methods to describe relative configurations of a robot, which could be used to describe sample points along a continuous trajectory in \(\cal Q\).

6.1.1. Planar Rotations#

Rotation in the plane can be represented using \(2\times 2\) rotation matrices.

In the previous chapter, we represented the orientation of the robot simply by \(\theta \in [0,2\pi)\). There are two main drawbacks to this choice. First, the mapping from the robot’s orientation to the parameter \(\theta\) is not continuous. If the robot’s orientation is \(\theta = 2\pi - \epsilon\), for example, and we rotate the robot counterclockwise (decreasing \(\epsilon\) to zero), we will encounter a discontinuity when \(\theta\) goes from very nearly \(2\pi\) to zero. This will occur regardless of the initial value we choose for \(\epsilon\): when \(\epsilon\) reaches zero, there will be a jump in the value of \(\theta\). We call this jump a \(singularity\). The second drawback is that parameterizing orientation by a single angle \(\theta\) does not generalize in a straightforward way to representing orientation in 3D. Further, even if we go to the effort to make this generalization work, the problems with singularities are complex for 3-dimensional rotations, and can cause serious problems if not handled properly.

Two coordinate frames that share a common origin.

Both difficulties above stem from the use of a minimal representation of orientation. To solve these problems, we’ll introduce a somewhat redundant representation that explicity encodes the orientation of the axes of a target coordinate frame relative to a reference coordinate frame. Consider the two coordinate frames shown in the figure above. These two frames share a common origin, and Frame 1 is obtained by rotating by an angle \(\theta\) w.r.t. frame 0. We can express the \(x\)-axis of Frame 1 in coordinates relative to Frame 0 by projecting \(x_1\) onto \(x_0\) and \(y_0\). Likewise, we can express the \(y\)-axis of Frame 1 w.r.t. Frame 0 by projecting \(y_1\) onto \(x_0\) and \(y_0\). Because all of these vectors are unit vectors, the projections can be achieved using the dot product:

\[\begin{split} x_{1}^{0}= \begin{bmatrix} x_0 \cdot x_1 \\ y_0 \cdot x_1 \end{bmatrix}= \begin{bmatrix} \cos \theta \\ \sin \theta \end{bmatrix} ~~~~~~~~~~~ y_{1}^{0}= \begin{bmatrix} x_0 \cdot y_1 \\ y_0 \cdot y_1 \end{bmatrix}= \begin{bmatrix} -\sin \theta \\ \cos \theta \end{bmatrix} \end{split}\]

Above, we adopt the notational convention that a subscript indicates the target frame, and the superscript indicates the reference frame. Thus, \(x_1^0\) denotes the \(x\)-axis of Frame 1, expressed in coordinates relative to Frame 0.

We can group these two vectors into a rotation matrix

\[\begin{split} R_{1}^{0} =\begin{bmatrix} x_0 \cdot x_1 & x_0 \cdot y_1\\ y_0 \cdot x_1 & y_0 \cdot y_1 \end{bmatrix} =\begin{bmatrix} \cos \theta & -\sin \theta\\ \sin \theta & \cos \theta \end{bmatrix} \end{split}\]

that describes the orientation of Frame 1 w.r.t. Frame 0. Using the notational convention defined above, \(R_{1}^{0}\) indicates the reference frame as the superscript (Frame 0) and the target frame as the subscript (Frame 1).

6.1.2. The Special Orthogonal Group of Order 2, aka SO(2)#

Rotation matrices form a group.

The set of \(2\times2\) rotation matrices, together with matrix multiplication as the composition operator, form a group called the Special Orthogonal group of order 2, denoted by \(SO(2)\). All matrices \(R \in SO(2)\) have the following properties:

  1. Each column \(c_i\) is a unit vector.

  2. The columns are orthogonal (i.e., \(c_i \cdot c_j = 0\) for \(i \neq j\)).

  3. The inverse of \(R\) is equal to its transpose, \(R^{-1}=R^{T}\).

  4. The determinant of \(R\) is +1.

The first three properties hold for all orthogonal matrices.

It is easy to understand the first two properties by thinking of the columns as representations of coordinate axes. We typically use unit vectors to represent the directions of coordinate axes, and any two coordinate axes are orthogonal to one another.

The third property provides a simple method for inverting rotation matrices, which is a frequently used operation in robotics applications.

The fourth property holds for rotation matrices, but not for arbitrary orthogonal matrices. The determinant of an arbitrary orthogonal matrix is \(\pm 1\). Those orthogonal matrices with determinant \(+1\) correspond to right-handed coordinate systems (when curling the fingers of your right hand from the \(x\)-axis to the \(y\)-axis, your thumb points in the direction of the \(z\)-axis, which is “out of the page” in the 2D case). Those orthogonal matrices with determinant \(-1\) correspond to left-handed systems, which are useful in computer graphics applications, but not so much for roboticists. Note that it is this fourth property that justifies the designation special for this set of matrices.

As we will see in the next chapter, all of the properties described above generalize to the case of rotations in 3D, and the corresponding group of 3D rotation matrices, \(SO(3)\).

6.1.3. Coordinate Transformations for Pure Rotations#

The rotation group acts on 2D points.

Coordinate transformation under pure rotation.

We can use rotation matrices to perform coordinate transformations under pure rotation. Consider a point \(P\) that is rigidly attached to Frame 1 and, as illustrated in the Figures above, whose coordinates w.r.t. Frame 1 are given by

\[\begin{split} P_{}^{1}=\begin{bmatrix} p_x \\ p_y \end{bmatrix} \end{split}\]

Using the definition of coordinates, we can describe the point \(P\) using the equation

\[ P = p_x x_1 + p_y y_1 \]

in which \(x_1\) and \(y_1\) are the vectors defining the coordinate axes of Frame 1.

To compute the coordinates \(P^0\) of \(P\) w.r.t. Frame 0, we merely project \(P\) onto the \(x\)- and \(y\)- axes of Frame 0. Since the vectors \(x_0\) and \(y_0\) are unit vectors, this projection can again be accomplished using a the dot product:

\[\begin{split} P^0 = \begin{bmatrix} x_0 \cdot P \\ y_0 \cdot P \end{bmatrix} = \begin{bmatrix} x_0 \cdot (p_x x_1 + p_y y_1) \\ y_0 \cdot (p_x x_1 + p_y y_1) \end{bmatrix} = \begin{bmatrix} (x_0 \cdot x_1) p_x + (x_0 \cdot y_1) p_y \\ (y_0 \cdot x_1) p_x + (y_0 \cdot y_1) p_y \end{bmatrix} = \begin{bmatrix} x_0 \cdot x_1 & x_0 \cdot y_1 \\ y_0 \cdot x_1 & y_0 \cdot y_1 \end{bmatrix} \begin{bmatrix} p_x \\ p_y \end{bmatrix} = R^0_1 P^1 \end{split}\]

Summarizing the above, we have

\[ P^0 = R^0_1 P^1 \]

which is how a rotation \(R^0_1\) acts on a point \(P^1\).

6.1.4. The Group \(SO(2)\) in GTSAM#

Rot2 is a group, and acts on Point2.

\(SO(2)\) is a group, in the mathematical sense. A group consists of a set (in our case, a set of 2D rotations) along with a group operator (in our case, composition). All groups posses certain properties, sometimes called the group axioms. For the group, \(SO(2)\) these properties are as follows:

  1. Closure: For all rotations \(R, R' \in SO(2)\), their product is also in \(SO(2)\), i.e., \(RR' \in SO(2)\).

  2. Identity Element: The \(2\times 2\) identity matrix \(I\) is included in the group, and for all \(R\in SO(2)\) we have \(RI = IR = R\).

  3. Inverse: For every \(R \in SO(2)\) there exists \(R^{-1} \in SO(2)\) such that \(R^{-1}R = RR^{-1} = I\).

  4. Associativity: For all \(R_1, R_2, R_3 \in SO(2)\), \((R_1 R_2) R_3 = R_1 (R_2 R_3)\).

In addition (only in 2D!) rotations in \(SO(2)\) commute:

  1. Commutativity: For all \(R_1, R_2 \in SO(2)\), \(R_1 R_2 = R_2 R_1\).

It will not surprise anyone by now that all of this is built into GTSAM. In particular, the 2-D rotations are represented by the type Rot2:

theta = math.radians(30)
R = gtsam.Rot2(theta)
print(R.matrix())
[[ 0.8660254 -0.5      ]
 [ 0.5        0.8660254]]

Rotations in 2D form a commutative group, as demonstrated here in code:

print(isinstance(R * R, gtsam.Rot2)) # closure
I2 = gtsam.Rot2.Identity()
print(R.equals(R * I2, 1e-9)) # identity
print((R * R.inverse()).equals(I2, 1e-9)) # inverse
R1, R2, R3 = gtsam.Rot2(1), gtsam.Rot2(2), gtsam.Rot2(3)
print(((R1 * R2)* R3).equals(R1 * (R2 * R3), 1e-9)) # associative
print((R1 * R2).equals(R2 * R1, 1e-9)) # commutative
True
True
True
True
True

Finally, rotations can act on points, which we can do using matrix multiplication, or using the Rot2.rotate method:

R01 =  gtsam.Rot2(math.radians(20))
P1 = gtsam.Point2(4,3)
print(f"P0 = {R01.matrix() @ P1}")
print(f"P0 = {R01.rotate(P1)}")
P0 = [2.73271005 4.18715844]
P0 = [2.73271005 4.18715844]

6.1.5. 2D Rigid Transforms aka SE(2)#

From \(SO(2)\) to \(SE(2)\).

2D rigid transforms combine rotation with translation, and are used in two different ways in robotics:

  • to specify the pose of a robot, which consists of a translation and an orientation

  • to talk about relative pose or the transformation between two different robot poses

We can use the same mathematical object for both use cases, i.e., 2D rigid transforms \(T\in SE(2)\), which combine translation and orientation in a \(3 \times 3\) matrix, as explained below.

Coordinate transformation under both rotation and translation.

Let us start by assuming as before that the point \(P\) is rigidly attached to a frame that is rotated by angle \(\theta\) w.r.t. Frame 0. If we now translate that frame according to a vector \(d\), as shown in the figure above, the point \(P\) undergoes the same translation as the frame. To calculate the coordinates of \(P\) relative to Frame 0, we merely perform the rotational coordinate transformation as above, and then add the translation from the origin of Frame 0 to the origin of Frame 1:

\[ P^0 = R^0_1 P^1 + d^0_1 \]

We can write the above coordinate transformation as a matrix equation:

\[\begin{split} \begin{align} \begin{bmatrix} P^0 \\ 1 \end{bmatrix} = \begin{bmatrix} R_{1}^{0} & d_{1}^{0}\\ 0_{2} & 1 \end{bmatrix} \begin{bmatrix} P^1 \\ 1 \end{bmatrix} \end{align} \end{split}\]

in which \(0_2\) denotes the row vector \([0~ 0]\). Note that the bottom row of this matrix equation is merely the equation

\[\begin{split} 1 = \begin{bmatrix} 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} p_x \\ p_y \\ 1 \end{bmatrix} \end{split}\]

which holds for any \(p_x, p_y\).

It is convenient to define the 3-vector \(\tilde{P}\) as

\[\begin{split} \tilde{P} = \begin{bmatrix} P \\ 1 \end{bmatrix} \end{split}\]

which we call the homogeneous coordinates of the point \(P\). Using this convention, we can write the homogeneous coordinate transformation as

\[ \tilde{P}^0 = T^0_1 \tilde{P}^1 \]

The set of matrices of the form

\[\begin{split} T = \begin{bmatrix} R & d\\ 0_2 & 1 \end{bmatrix} \end{split}\]

with \(R \in SO(2)\) and \(d \in \mathbb{R}^2\) is called the special Euclidean group of order 2, denoted by SE(2).

6.1.6. Composition of Homogeneous Transformations#

We can implement composition by matrix multiplication.

The composition of two homogeneous transformations.

Consider a mobile robot that begins at some initial configuration, moves to a new location, reorients itself and then moves to another location. How can we determine the final configuration of the robot if we are given only the individual motions in the sequence? The answer is suprisingly simple. Suppose the point \(P\) is rigidly attached to the robot. Using the coordinate transformation equation \(\tilde{P}^i = T^i_j \tilde{P}^j\), we have the following:

\[\tilde{P}^0 = T^0_1 \tilde{P}^1, ~~~~~ \tilde{P}^1 = T^1_2 \tilde{P}^2, ~~~~~ \tilde{P}^0 = T^0_2 \tilde{P}^2\]

Combining the first two of these gives

\[\tilde{P}^0 = T^0_1 T^1_2 \tilde{P}^2\]

and since this equation holds for every possible \(\tilde{P}^2\) we have

\[T^0_2 = T^0_1 T^1_2 \]

It is easy to show (by induction) that this result extends to an arbitarily long sequence of transformations. Therefore, if we have a sequence of robot configurations \(T^{i-1}_i\), for \(i = 1, \dots , n\) we can compute the composite transformation \(T^0_n\) using

\[T^0_n = T^0_1 \dots T^{n-1}_n \]

6.1.7. The Group \(SE(2)\) in GTSAM#

Pose2 is a non-commutative group, and also acts on Point2.

Unsurprisingly, \(SE(2)\) is also a group, as the following properties all hold:

  1. Closure: For all transforms \(T, T' \in SE(2)\), their product is also in \(SE(2)\), i.e., \(T T' \in SE(2)\).

  2. Identity Element: The \(3\times 3\) identity matrix \(I\) is included in the group, and for all \(T \in SE(2)\) we have \(T I = I T = T\).

  3. Inverse: For every \(T \in SE(2)\) there exists \(T^{-1} \in SE(2)\) such that \(T^{-1}T = T T^{-1} = I\).

  4. Associativity: For all \(T_1, T_2, T_3 \in SE(2)\), \((T_1 T_2) T_3 = T_1 (T_2 T_3)\).

However, in contrast to 2D rotations, 2D rigid transforms do not commute. Also, The inverse \(T^{-1}\) is not just the transpose; instead, we have:

\[\begin{split} T^{-1} = \begin{bmatrix}R & d\\0_2 & 1\end{bmatrix}^{-1} = \begin{bmatrix}R^T & -R^T d\\0_2 & 1\end{bmatrix} \end{split}\]

which you can easily verify by multiplying to obtain \(I\).

Again, all of this is built into GTSAM,where both 2D poses and 2D rigid transforms are represented by the type Pose2:

theta = math.radians(30)
T = gtsam.Pose2(3, -2, theta)
print(f"2D Pose (x, y, theta) = {T}corresponding to transformation matrix:\n{T.matrix()}")
2D Pose (x, y, theta) = (3, -2, 0.523599)
corresponding to transformation matrix:
[[ 0.8660254 -0.5        3.       ]
 [ 0.5        0.8660254 -2.       ]
 [ 0.         0.         1.       ]]

2D transforms form a non-commutative group, as demonstrated here in code:

print(isinstance(T * T, gtsam.Pose2)) # closure
I3 = gtsam.Pose2.Identity()
print(T.equals(T * I3, 1e-9)) # identity
print((T * T.inverse()).equals(I3, 1e-9)) # inverse
T1, T2, T3 = gtsam.Pose2(1,2,3), gtsam.Pose2(4,5,6), gtsam.Pose2(7,8,9)
print(((T1 * T2)* T3).equals(T1 * (T2 * T3), 1e-9)) # associative
print((T1 * T2).equals(T2 * T1, 1e-9)) # NOT commutative
True
True
True
True
False

Finally, 2D transforms can act on points, which we can do using matrix multiplication, or using the Pose2.transformFrom method:

T01 =  gtsam.Pose2(1, 2, math.radians(20))
P1 = gtsam.Point2(4,3)
print(f"P0 = {T01.matrix() @ [4, 3, 1]}")  # need to make P0 homogeneous
print(f"P0 = {T01.transformFrom(P1)}")
P0 = [3.73271005 6.18715844 1.        ]
P0 = [3.73271005 6.18715844]

6.1.8. Lie Groups*#

Groups can be either discrete or continuous. For example, the set of integers along with addition as the group operation and zero as the identity forms a discrete group. In contrast, both \(SO(2)\) and \(SE(2)\) are continuous groups. Not only are they continuous, but they are examples of manifolds. Informally, a manifold is a set that is locally like \(\mathbb{R}^k\) for some \(k\). In the present case, \(SO(2)\) is locally like \(\mathbb{R}^1\), and \(SE(2)\) is locally like \(\mathbb{R}^3\).

While the topic of manifolds is beyond the scope of this book, the easiest way to understand this is to note that \(SO(2)\) can be parameterized by the angle \(\theta \in [0,2\pi) \subset \mathbb{R}^1\). This parameterization is local (and not global) because it breaks down in a neighborhood of \(\theta = 2\pi\) (as described above). This is not a problem, however, since in that case, we can simply choose to use \(\theta \in [-\pi, \pi)\) instead of \(\theta \in [0,2\pi)\). We can completely cover \(SO(2)\) using two parameterizations: one for cases when \(\theta\) is near \(2\pi\), and a second for when \(\theta\) is near \(\pi\).

For the case of \(SE(2)\), we merely use the two parameterizations \((x,y,\theta) \in \mathbb{R}^2 \times [0,2\pi) \subset \mathbb{R}^3\) and \((x,y,\theta) \in \mathbb{R}^2 \times [-\pi,\pi) \subset \mathbb{R}^3\), which demonstrates that locally \(SE(2)\) is like \(\mathbb{R}^3\).

In addition to being manifolds, both \(SO(2)\) and \(SE(2)\) posses the additional property of being differentiable. Such groups are known as Lie groups (pronounced “Lee groups”). While again beyond the scope of this book, we can understand this by considering the parameterizations for \(SO(2)\) and \(SE(2)\) described above. For each of these parameterizations, we can differentiate the elements of \(SO(2)\) and \(SE(2)\), which implies the existence of linear and angular velocities. In fact, in this chapter, we will define the actions of the car using these velocities as the inputs to the system.

6.1.9. GTSAM 101#

A deeper dive in the GTSAM concepts used above.

The types Point2 and Rot2 are used for 2D position and rotation. While Point2 is really just an alias for a 3-dimensional numpy array, the Rot2 type is a wrapper around a c++ GTSAM type that represents \(SO(2)\) using just a cosine and sine value, i.e., a point on the unit circle.

Using a 2D position and a 2D rotation we can create a 2D pose, represented by a gtsam.Pose2.As always, you can execute help(gtsam.Pose2) to get the full documentation of a class. Below is an excerpt with some useful methods. We have several constructors:

__init__(...)
    __init__(*args, **kwargs)
    Overloaded function.
    
    1. __init__(self: Pose2) -> None
    
    2. __init__(self: Pose2, other: Pose2) -> None
    
    3. __init__(self: Pose2, x: float, y: float, theta: float) -> None
    
    4. __init__(self: Pose2, theta: float, t: Point2) -> None
    
    5. __init__(self: Pose2, r: Rot2, t: Point2) -> None
    
    6. __init__(self: Pose2, v: numpy.ndarray[numpy.float64[m, 1]]) -> None

where t above is just a Point2, and v is supposed to be a 3-vector with x, y, and \(\theta\), in that order. We also have Pose2.transformFrom and Pose2.transformTo methods, which act on 2D points:

 transformFrom(...)
     transformFrom(*args, **kwargs)
     Overloaded function.
     
     1. transformFrom(self: Pose2, point: Point2) -> Point2
     ...
 |
 transformTo(...)
     transformTo(*args, **kwargs)
     Overloaded function.
     
     1. transformTo(self: Pose2, point: Point2) -> Point2
     ....

Both have them have three overloads, but we show only the most commonly used one above. When using code, it is often a good idea to use indices to keep the different coordinate frames apart. For example, for a pose \(T^a_b\) the method transformFrom implements the following

\[ P^a = T^a_b P^b \]

i.e., it transforms from frame B to frame A. The other method, transformTo transforms to frame B and hence implements

\[ P^b = (T^a_b)^{-1} P^a \]