6.1. Planar Geometry#
Linear algebra can be used to deal efficiently with rotation and translations in the plane.
In the previous chapter, we saw that the configuration space for a differential drive robot can be represented as \({\cal Q} = \mathbb{R}^2 \times [0, 2\pi),\) and we used \(q = (x,y,\theta)\) to parameterize this configuration space. This choice of parameterization has the nice property of being minimal: \({\cal Q}\) is a threedimensional object, and therefore at least three parameters are required to represent arbitrary configurations. Unfortunately, this choice of parameterization doesn’t help much when reasoning about the geometry of the robot and its motion. Recall, for example, the ad hoc planar geometry we used to derive coordinates of points on the DDR as a function of \(x,y,\theta\). In this section, we use tools from linear algebra to represent and manipulate configurations. We begin by focusing our attention on representing orientation. We then extend our methods to include translation, and develop methods to describe relative configurations of a robot, which could be used to describe sample points along a continuous trajectory in \(\cal Q\).
6.1.1. Planar Rotations#
Rotation in the plane can be represented using \(2\times 2\) rotation matrices.
In the previous chapter, we represented the orientation of the robot simply by \(\theta \in [0,2\pi)\). There are two main drawbacks to this choice. First, the mapping from the robot’s orientation to the parameter \(\theta\) is not continuous. If the robot’s orientation is \(\theta = 2\pi  \epsilon\), for example, and we rotate the robot counterclockwise (decreasing \(\epsilon\) to zero), we will encounter a discontinuity when \(\theta\) goes from very nearly \(2\pi\) to zero. This will occur regardless of the initial value we choose for \(\epsilon\): when \(\epsilon\) reaches zero, there will be a jump in the value of \(\theta\). We call this jump a \(singularity\). The second drawback is that parameterizing orientation by a single angle \(\theta\) does not generalize in a straightforward way to representing orientation in 3D. Further, even if we go to the effort to make this generalization work, the problems with singularities are complex for 3dimensional rotations, and can cause serious problems if not handled properly.
Both difficulties above stem from the use of a minimal representation of orientation. To solve these problems, we’ll introduce a somewhat redundant representation that explicity encodes the orientation of the axes of a target coordinate frame relative to a reference coordinate frame. Consider the two coordinate frames shown in the figure above. These two frames share a common origin, and Frame 1 is obtained by rotating by an angle \(\theta\) w.r.t. frame 0. We can express the \(x\)axis of Frame 1 in coordinates relative to Frame 0 by projecting \(x_1\) onto \(x_0\) and \(y_0\). Likewise, we can express the \(y\)axis of Frame 1 w.r.t. Frame 0 by projecting \(y_1\) onto \(x_0\) and \(y_0\). Because all of these vectors are unit vectors, the projections can be achieved using the dot product:
Above, we adopt the notational convention that a subscript indicates the target frame, and the superscript indicates the reference frame. Thus, \(x_1^0\) denotes the \(x\)axis of Frame 1, expressed in coordinates relative to Frame 0.
We can group these two vectors into a rotation matrix
that describes the orientation of Frame 1 w.r.t. Frame 0. Using the notational convention defined above, \(R_{1}^{0}\) indicates the reference frame as the superscript (Frame 0) and the target frame as the subscript (Frame 1).
6.1.2. The Special Orthogonal Group of Order 2, aka SO(2)#
Rotation matrices form a group.
The set of \(2\times2\) rotation matrices, together with matrix multiplication as the composition operator, form a group called the Special Orthogonal group of order 2, denoted by \(SO(2)\). All matrices \(R \in SO(2)\) have the following properties:
Each column \(c_i\) is a unit vector.
The columns are orthogonal (i.e., \(c_i \cdot c_j = 0\) for \(i \neq j\)).
The inverse of \(R\) is equal to its transpose, \(R^{1}=R^{T}\).
The determinant of \(R\) is +1.
The first three properties hold for all orthogonal matrices.
It is easy to understand the first two properties by thinking of the columns as representations of coordinate axes. We typically use unit vectors to represent the directions of coordinate axes, and any two coordinate axes are orthogonal to one another.
The third property provides a simple method for inverting rotation matrices, which is a frequently used operation in robotics applications.
The fourth property holds for rotation matrices, but not for arbitrary orthogonal matrices. The determinant of an arbitrary orthogonal matrix is \(\pm 1\). Those orthogonal matrices with determinant \(+1\) correspond to righthanded coordinate systems (when curling the fingers of your right hand from the \(x\)axis to the \(y\)axis, your thumb points in the direction of the \(z\)axis, which is “out of the page” in the 2D case). Those orthogonal matrices with determinant \(1\) correspond to lefthanded systems, which are useful in computer graphics applications, but not so much for roboticists. Note that it is this fourth property that justifies the designation special for this set of matrices.
As we will see in the next chapter, all of the properties described above generalize to the case of rotations in 3D, and the corresponding group of 3D rotation matrices, \(SO(3)\).
6.1.3. Coordinate Transformations for Pure Rotations#
The rotation group acts on 2D points.
We can use rotation matrices to perform coordinate transformations under pure rotation. Consider a point \(P\) that is rigidly attached to Frame 1 and, as illustrated in the Figures above, whose coordinates w.r.t. Frame 1 are given by
Using the definition of coordinates, we can describe the point \(P\) using the equation
in which \(x_1\) and \(y_1\) are the vectors defining the coordinate axes of Frame 1.
To compute the coordinates \(P^0\) of \(P\) w.r.t. Frame 0, we merely project \(P\) onto the \(x\) and \(y\) axes of Frame 0. Since the vectors \(x_0\) and \(y_0\) are unit vectors, this projection can again be accomplished using a the dot product:
Summarizing the above, we have
which is how a rotation \(R^0_1\) acts on a point \(P^1\).
6.1.4. The Group \(SO(2)\) in GTSAM#
Rot2
is a group, and acts onPoint2
.
\(SO(2)\) is a group, in the mathematical sense. A group consists of a set (in our case, a set of 2D rotations) along with a group operator (in our case, composition). All groups posses certain properties, sometimes called the group axioms. For the group, \(SO(2)\) these properties are as follows:
Closure: For all rotations \(R, R' \in SO(2)\), their product is also in \(SO(2)\), i.e., \(RR' \in SO(2)\).
Identity Element: The \(2\times 2\) identity matrix \(I\) is included in the group, and for all \(R\in SO(2)\) we have \(RI = IR = R\).
Inverse: For every \(R \in SO(2)\) there exists \(R^{1} \in SO(2)\) such that \(R^{1}R = RR^{1} = I\).
Associativity: For all \(R_1, R_2, R_3 \in SO(2)\), \((R_1 R_2) R_3 = R_1 (R_2 R_3)\).
In addition (only in 2D!) rotations in \(SO(2)\) commute:
Commutativity: For all \(R_1, R_2 \in SO(2)\), \(R_1 R_2 = R_2 R_1\).
It will not surprise anyone by now that all of this is built into GTSAM. In particular, the 2D rotations are represented by the type Rot2
:
theta = math.radians(30)
R = gtsam.Rot2(theta)
print(R.matrix())
[[ 0.8660254 0.5 ]
[ 0.5 0.8660254]]
Rotations in 2D form a commutative group, as demonstrated here in code:
print(isinstance(R * R, gtsam.Rot2)) # closure
I2 = gtsam.Rot2.Identity()
print(R.equals(R * I2, 1e9)) # identity
print((R * R.inverse()).equals(I2, 1e9)) # inverse
R1, R2, R3 = gtsam.Rot2(1), gtsam.Rot2(2), gtsam.Rot2(3)
print(((R1 * R2)* R3).equals(R1 * (R2 * R3), 1e9)) # associative
print((R1 * R2).equals(R2 * R1, 1e9)) # commutative
True
True
True
True
True
Finally, rotations can act on points, which we can do using matrix multiplication, or using the Rot2.rotate
method:
R01 = gtsam.Rot2(math.radians(20))
P1 = gtsam.Point2(4,3)
print(f"P0 = {R01.matrix() @ P1}")
print(f"P0 = {R01.rotate(P1)}")
P0 = [2.73271005 4.18715844]
P0 = [2.73271005 4.18715844]
6.1.5. 2D Rigid Transforms aka SE(2)#
From \(SO(2)\) to \(SE(2)\).
2D rigid transforms combine rotation with translation, and are used in two different ways in robotics:
to specify the pose of a robot, which consists of a translation and an orientation
to talk about relative pose or the transformation between two different robot poses
We can use the same mathematical object for both use cases, i.e., 2D rigid transforms \(T\in SE(2)\), which combine translation and orientation in a \(3 \times 3\) matrix, as explained below.
Let us start by assuming as before that the point \(P\) is rigidly attached to a frame that is rotated by angle \(\theta\) w.r.t. Frame 0. If we now translate that frame according to a vector \(d\), as shown in the figure above, the point \(P\) undergoes the same translation as the frame. To calculate the coordinates of \(P\) relative to Frame 0, we merely perform the rotational coordinate transformation as above, and then add the translation from the origin of Frame 0 to the origin of Frame 1:
We can write the above coordinate transformation as a matrix equation:
in which \(0_2\) denotes the row vector \([0~ 0]\). Note that the bottom row of this matrix equation is merely the equation
which holds for any \(p_x, p_y\).
It is convenient to define the 3vector \(\tilde{P}\) as
which we call the homogeneous coordinates of the point \(P\). Using this convention, we can write the homogeneous coordinate transformation as
The set of matrices of the form
with \(R \in SO(2)\) and \(d \in \mathbb{R}^2\) is called the special Euclidean group of order 2, denoted by SE(2).
6.1.6. Composition of Homogeneous Transformations#
We can implement composition by matrix multiplication.
Consider a mobile robot that begins at some initial configuration, moves to a new location, reorients itself and then moves to another location. How can we determine the final configuration of the robot if we are given only the individual motions in the sequence? The answer is suprisingly simple. Suppose the point \(P\) is rigidly attached to the robot. Using the coordinate transformation equation \(\tilde{P}^i = T^i_j \tilde{P}^j\), we have the following:
Combining the first two of these gives
and since this equation holds for every possible \(\tilde{P}^2\) we have
It is easy to show (by induction) that this result extends to an arbitarily long sequence of transformations. Therefore, if we have a sequence of robot configurations \(T^{i1}_i\), for \(i = 1, \dots , n\) we can compute the composite transformation \(T^0_n\) using
6.1.7. The Group \(SE(2)\) in GTSAM#
Pose2
is a noncommutative group, and also acts onPoint2
.
Unsurprisingly, \(SE(2)\) is also a group, as the following properties all hold:
Closure: For all transforms \(T, T' \in SE(2)\), their product is also in \(SE(2)\), i.e., \(T T' \in SE(2)\).
Identity Element: The \(3\times 3\) identity matrix \(I\) is included in the group, and for all \(T \in SE(2)\) we have \(T I = I T = T\).
Inverse: For every \(T \in SE(2)\) there exists \(T^{1} \in SE(2)\) such that \(T^{1}T = T T^{1} = I\).
Associativity: For all \(T_1, T_2, T_3 \in SE(2)\), \((T_1 T_2) T_3 = T_1 (T_2 T_3)\).
However, in contrast to 2D rotations, 2D rigid transforms do not commute. Also, The inverse \(T^{1}\) is not just the transpose; instead, we have:
which you can easily verify by multiplying to obtain \(I\).
Again, all of this is built into GTSAM,where both 2D poses and 2D rigid transforms are represented by the type Pose2
:
theta = math.radians(30)
T = gtsam.Pose2(3, 2, theta)
print(f"2D Pose (x, y, theta) = {T}corresponding to transformation matrix:\n{T.matrix()}")
2D Pose (x, y, theta) = (3, 2, 0.523599)
corresponding to transformation matrix:
[[ 0.8660254 0.5 3. ]
[ 0.5 0.8660254 2. ]
[ 0. 0. 1. ]]
2D transforms form a noncommutative group, as demonstrated here in code:
print(isinstance(T * T, gtsam.Pose2)) # closure
I3 = gtsam.Pose2.Identity()
print(T.equals(T * I3, 1e9)) # identity
print((T * T.inverse()).equals(I3, 1e9)) # inverse
T1, T2, T3 = gtsam.Pose2(1,2,3), gtsam.Pose2(4,5,6), gtsam.Pose2(7,8,9)
print(((T1 * T2)* T3).equals(T1 * (T2 * T3), 1e9)) # associative
print((T1 * T2).equals(T2 * T1, 1e9)) # NOT commutative
True
True
True
True
False
Finally, 2D transforms can act on points, which we can do using matrix multiplication, or using the Pose2.transformFrom
method:
T01 = gtsam.Pose2(1, 2, math.radians(20))
P1 = gtsam.Point2(4,3)
print(f"P0 = {T01.matrix() @ [4, 3, 1]}") # need to make P0 homogeneous
print(f"P0 = {T01.transformFrom(P1)}")
P0 = [3.73271005 6.18715844 1. ]
P0 = [3.73271005 6.18715844]
6.1.8. Lie Groups*#
Groups can be either discrete or continuous. For example, the set of integers along with addition as the group operation and zero as the identity forms a discrete group. In contrast, both \(SO(2)\) and \(SE(2)\) are continuous groups. Not only are they continuous, but they are examples of manifolds. Informally, a manifold is a set that is locally like \(\mathbb{R}^k\) for some \(k\). In the present case, \(SO(2)\) is locally like \(\mathbb{R}^1\), and \(SE(2)\) is locally like \(\mathbb{R}^3\).
While the topic of manifolds is beyond the scope of this book, the easiest way to understand this is to note that \(SO(2)\) can be parameterized by the angle \(\theta \in [0,2\pi) \subset \mathbb{R}^1\). This parameterization is local (and not global) because it breaks down in a neighborhood of \(\theta = 2\pi\) (as described above). This is not a problem, however, since in that case, we can simply choose to use \(\theta \in [\pi, \pi)\) instead of \(\theta \in [0,2\pi)\). We can completely cover \(SO(2)\) using two parameterizations: one for cases when \(\theta\) is near \(2\pi\), and a second for when \(\theta\) is near \(\pi\).
For the case of \(SE(2)\), we merely use the two parameterizations \((x,y,\theta) \in \mathbb{R}^2 \times [0,2\pi) \subset \mathbb{R}^3\) and \((x,y,\theta) \in \mathbb{R}^2 \times [\pi,\pi) \subset \mathbb{R}^3\), which demonstrates that locally \(SE(2)\) is like \(\mathbb{R}^3\).
In addition to being manifolds, both \(SO(2)\) and \(SE(2)\) posses the additional property of being differentiable. Such groups are known as Lie groups (pronounced “Lee groups”). While again beyond the scope of this book, we can understand this by considering the parameterizations for \(SO(2)\) and \(SE(2)\) described above. For each of these parameterizations, we can differentiate the elements of \(SO(2)\) and \(SE(2)\), which implies the existence of linear and angular velocities. In fact, in this chapter, we will define the actions of the car using these velocities as the inputs to the system.
6.1.9. GTSAM 101#
A deeper dive in the GTSAM concepts used above.
The types Point2
and Rot2
are used for 2D position and rotation.
While Point2
is really just an alias for a 3dimensional numpy array, the Rot2
type is a wrapper around a c++ GTSAM type that represents \(SO(2)\) using just a cosine and sine value, i.e., a point on the unit circle.
Using a 2D position and a 2D rotation we can create a 2D pose, represented by a gtsam.Pose2
.As always, you can execute help(gtsam.Pose2)
to get the full documentation of a class. Below is an excerpt with some useful methods. We have several constructors:
__init__(...)
__init__(*args, **kwargs)
Overloaded function.
1. __init__(self: Pose2) > None
2. __init__(self: Pose2, other: Pose2) > None
3. __init__(self: Pose2, x: float, y: float, theta: float) > None
4. __init__(self: Pose2, theta: float, t: Point2) > None
5. __init__(self: Pose2, r: Rot2, t: Point2) > None
6. __init__(self: Pose2, v: numpy.ndarray[numpy.float64[m, 1]]) > None
where t
above is just a Point2
, and v
is supposed to be a 3vector with x, y, and \(\theta\), in that order. We also have Pose2.transformFrom
and Pose2.transformTo
methods, which act on 2D points:
transformFrom(...)
transformFrom(*args, **kwargs)
Overloaded function.
1. transformFrom(self: Pose2, point: Point2) > Point2
...

transformTo(...)
transformTo(*args, **kwargs)
Overloaded function.
1. transformTo(self: Pose2, point: Point2) > Point2
....
Both have them have three overloads, but we show only the most commonly used one above. When using code, it is often a good idea to use indices to keep the different coordinate frames apart. For example, for a pose \(T^a_b\) the method transformFrom
implements the following
i.e., it transforms from frame B to frame A. The other method, transformTo
transforms to frame B and hence implements