Open In Colab

7.1. Moving in Three Dimensions#

Motion in 3D has 6 degrees of freedom.

Splash image with steampunk drones in various orientations

In Section 6.1 we introduced the space of 2D rigid transformations \(SE(2)\), and showed how \(3 \times 3\) matrices can be used to simultaneously represent rotation and translation in two dimensions. At that time, we promised that this could be easily generalized to the case of rotation and translation in three dimensions. In this section, we deliver on that promise, and introduce two new spaces: the 3D rotations \(SO(3)\) and 3D rigid transformations \(SE(3)\).

In addition to representing the pose of a drone in 3D we are also interested in representing velocity. We will discuss both linear and angular velocity, i.e., the rate of change in the orientation of one coordinate frame with respect to another.

The discussion below holds for generic coordinate frames, but because this is the drone chapter, we introduce two frames here that make things a bit more tangible:

  • the navigation frame \(N\), which you can think of as attached to the Earth;

  • the body frame \(B\), which you can think of as rigidly attached to the drone.

In the next section the different conventions for these coordinate frames are discussed in more detail. Below we simply use the subscripts “n” and “b” in the definitions and examples, irrespective of which convention is used.

7.1.1. Rotations in 3D aka SO(3)#

In Section 6.1, we constructed a rotation matrix \(R^0_1 \in SO(2)\) by projecting the axes of Frame 1 onto Frame 0. The extension to 3D is straightforward. Using the navigation and body coordinate frames defined above, let us project the axes \((x_b, y_b, z_b)\) of the body frame \(B\) onto the axes \((x_n,y_n, z_n)\) of the navigation frame \(N\). Of course, in the 3D case each frame is equipped with an additional \(z\)-axis:

\[\begin{split} R^{n}_{b} =\begin{bmatrix} x_n \cdot x_b & x_n \cdot y_b & x_n \cdot z_b \\ y_n \cdot x_b & y_n \cdot y_b & y_n \cdot z_b \\ z_n \cdot x_b & z_n \cdot y_b & z_n \cdot z_b \\ \end{bmatrix} \end{split}\]

The expression for rotational coordinate transformations is now exactly the same as in Chapter 6. Given a point \(p\) with coordinates expressed relative to the body frame \(B\), we compute its coordinates \(p^n\) in the navigation frame \(N\) as follows:

\[ p^n = R^n_b p^b \]

Extending the semantics from 2D rotations in \(SO(2)\) to 3D rotations represented by \(SO(3)\), the columns of \(R_{b}^{n}\) represent the axes of frame \(B\) in the \(N\) coordinate frame:

\[ R_{b}^{n}=\left[\begin{array}{ccc} \hat{x}_{b}^{n} & \hat{y}_{b}^{n} & \hat{z}_{b}^{n}\end{array}\right] \]

The 3D rotations together with composition constitute the special orthogonal group \(SO(3)\), hence the abbreviation “SO”. It is made up of all \(3\times3\) orthonormal matrices with determinant \(+1\). This group has matrix multiplication as its composition operation, but unlike in two dimensions, 3D rotations do not commute, i.e., in general \(R_{2}R_{1}\neq R_{1}R_{2}\). The group thus obtained it is called “special” because of the +1 determinant. Orthonormal matrices either have a -1 or +1 determinant, but those with -1 are reflections, which we specifically exclude here.

It is often useful to refer to rotations about one of the coordinate axes. For example, when a drone rotates around the vertical axis, this will typically be around the Z-axis, i.e., the yaw axis. The basic rotation matrices around X, Y, and Z-axes respectively are given by

\[\begin{split} R_{x,\phi} =\begin{bmatrix} 1 & 0 & 0 \\ 0 & \cos \phi & - \sin \phi \\ 0 & \sin \phi & \cos \phi \end{bmatrix} ~~~ R_{y,\theta} =\begin{bmatrix} \cos \theta & 0 & \sin \theta \\ 0 & 1 & 0 \\ -\sin \theta & 0 & \cos \theta \end{bmatrix} ~~~ R_{z,\psi} =\begin{bmatrix} \cos \psi &-\sin \psi & 0 \\ \sin \psi &\cos \psi & 0 \\ 0 & 0 & 1 \end{bmatrix} \end{split}\]

Note, in particular, the form of \(R_{z,\psi}\), the rotation about the \(z\)-axis. The upper-left \(2\times 2\) block of this matrix is exactly a rotation matrix \(R \in SO(2)\) as introduced in Section 6.1. This can be understood by realizing that rotation in the \(xy\) plane is actually a rotation about an implicit \(z\)- axis that points “out of the page.”

7.1.2. 3D Rigid transforms aka SE(3)#

In a similar way we can extend the group of 2D rigid transforms \(SE(2)\) to the case of rotation and translation in three dimensions. Suppose a point \(p\) is rigidly attached to frame \(B\), and we wish to determine its coordinates with respect to the navigation frame \(N\). If the coordinates of \(p\) with respect to frame \(B\) are given by \(p^b\), then we can express it in frame \(N\) as follows:

\[ p^{n}=R_{b}^{n}p^{b}+t_{b}^{n} \]

Above \(R^n_b \in SO(3)\) is the rotation matrix that specifies the orientation of frame \(B\) w.r.t. frame \(N\), and \(t^n_b \in \mathbb{R}^3\) gives the location of the origin of frame \(B\) specified in the coordinates of frame \(N\).

We denote this transform by \(T_{b}^{n}\doteq\left(R_{b}^{n},\,t_{b}^{n}\right)\). These transforms \(T_{b}^{n}\) are elements of the special Euclidean group of order 3, \(SE(3)\). Similarly to the previous chapter we can construct matrices in \(SE(3)\) by embedding the rotation and translation into a \(4\times4\) invertible matrix defined as

\[\begin{split} T_{b}^{n}=\begin{bmatrix} R_{b}^{n} & t_{b}^{n}\\ 0 & 1 \end{bmatrix}. \end{split}\]

Again, as in the previous chapter, we can implement the group operation as simple matrix multiplication. And, in analogy with the 2D case, if we use the homogeneous coordinates for a point \(p\), we define a 3D homogeneous transformation as

\[\begin{split} \begin{bmatrix} R_{b}^{n} & t_{b}^{n}\\ 0 & 1 \end{bmatrix} \begin{bmatrix} p^{b}\\ 1 \end{bmatrix} =\begin{bmatrix} R_{b}^{n}p^{b}+t_{b}^{n}\\ 1 \end{bmatrix} \end{split}\]

Note that both ways of defining and applying 3D rigid transformations are equivalent: sometimes it is more convenient to use one over the other. It should be apparent from the context as to which flavor we use.

7.1.3. The Group \(SE(3)\)#

\(SE(3)\) is a non-commutative group.

Unsurprisingly, \(SE(3)\) is also a group, as the following properties all hold:

  1. Closure: For all transforms \(T, T' \in SE(3)\), their product is also in \(SE(3)\), i.e., \(T T' \in SE(3)\).

  2. Identity: The \(4\times 4\) identity matrix \(I\) is included in the group, and for all \(T \in SE(3)\) we have \(T I = I T = T\).

  3. Inverse: For every \(T \in SE(3)\) there exists \(T^{-1} \in SE(3)\) such that \(T^{-1}T = T T^{-1} = I\).

  4. Associativity: For all \(T_1, T_2, T_3 \in SE(3)\), \((T_1 T_2) T_3 = T_1 (T_2 T_3)\).

The inverse \(T^{-1}\) is given by:

\[\begin{split} T^{-1} = \begin{bmatrix}R & d\\0_3 & 1\end{bmatrix}^{-1} = \begin{bmatrix}R^T & -R^T d\\0_3 & 1\end{bmatrix} \end{split}\]

However, 3D rigid transforms do not commute. Hence, \(SE(3)\) is a noncommutative group. In fact, so is the group \(SO(3)\): 3D rotations only commute when they represent rotations along the same axis. A special case is the set of 2D rotations from the last chapter, which are all around the imagined z-axis: this explains why rotations in 2D commute, but rotations in 3D do not, in general.

7.1.4. \(SE(3)\) in GTSAM#

Pose3 is a noncommutative group which acts on Point3.

Again, all of this is built into GTSAM, where both 3D poses and 3D rigid transforms are represented by the type Pose3, with rotation matrices represented by Rot3:

R = gtsam.Rot3.Yaw(math.degrees(20)) # rotation around Z-axis by 20 degrees
T = gtsam.Pose3(R, [10,20,30])
print(f"3D Pose {T}\ncorresponding to the transformation matrix:\n\n{T.matrix()}")
3D Pose R: [
	-0.720878, -0.693062, 0;
	0.693062, -0.720878, 0;
	0, 0, 1
t: 10 20 30

corresponding to the transformation matrix:

[[-0.72087779 -0.6930622   0.         10.        ]
 [ 0.6930622  -0.72087779  0.         20.        ]
 [ 0.          0.          1.         30.        ]
 [ 0.          0.          0.          1.        ]]

We can verify the group properties for 3D transforms in code, as we do below, noting that the last assertion for commutativity yields False, as \(SE(3)\) is a noncommutative group:

print("Closure:", isinstance(T * T, gtsam.Pose3)) # closure
I = gtsam.Pose3.Identity()
print("Identity:", T.equals(T * I, 1e-9)) # identity
print("Inverse:", (T * T.inverse()).equals(I, 1e-9)) # inverse
T1, T2, T3 = T, gtsam.Pose3(gtsam.Rot3.Roll(0.1), [1,2,3]), gtsam.Pose3(gtsam.Rot3.Roll(0.2), [1,2,3])
print("Associativity:", ((T1 * T2)* T3).equals(T1 * (T2 * T3), 1e-9)) # associative
print("Commutativity:", (T1 * T2).equals(T2 * T1, 1e-9)) # NOT commutative
Closure: True
Identity: True
Inverse: True
Associativity: True
Commutativity: False

Finally, 3D transforms can act on 3D points, which we can do using matrix multiplication, or using the Pose3.transformFrom method:

P1 = gtsam.Point3(2, 4, 3)
print(f"P0 = {T.matrix() @ [2, 4, 3, 1]}")  # need to make P0 homogeneous
print(f"P0 = {T.transformFrom(P1)}")
P0 = [ 5.78599563 18.50261322 33.          1.        ]
P0 = [ 5.78599563 18.50261322 33.        ]

You can see from the code snippet above that both approaches yield the same numerical values for the first three coordinates.

7.1.5. Velocity#

Velocity comes in two flavors: linear and angular.

The linear velocity \(v\) of the origin of a moving frame is easy to understand. This velocity is obtained merely by computing the derivative of the translation \(t\) of the frame’s origin:

\[ v \doteq \frac{d}{dt}t \]

We omit coordinate frame indices here to not clutter the notation.

The derivative \(\dot{R}\) of orientation is more interesting. We begin by introducing the family of \(3 \times 3\) skew symmetric matrices, denoted by \(\mathfrak{so}(3)\). The relationship between \(\mathfrak{so}(3)\) and \(SO(3)\) will become apparent shortly. A matrix \(S\) is said to be a skew symmetric matrix if

\[ S = - S^T \]

If we examine this equation for each element of \(S\), we obtain nine equations,

\[ s_{i,j} = - s_{j,i}, ~~~~ i,j = 1,2,3 \]

When \(i=j\), we have \(s_{ii} = -s_{ii}\), which is satisfied only if \(s_{ii} = 0\). Thus, the diagonal elements of a skew symmetric matrix are all zeros. The remaining equations constrain the off-diagonal entries, such that every \(S \in so(3)\) can be written as

\[\begin{split} \hat{s} = S = \begin{bmatrix} 0 & -s_z &s_y \\ s_z & 0 & -s_x \\ -s_y & s_x & 0 \end{bmatrix} \end{split}\]

in which the hat operator \(\hat{\cdot}\), maps a vector \(s = (s_x, s_y, s_z)\) to a skew symmetric matrix as above.

Skew symmetric matrices play a key role in representing the angular rate of change of coordinate frames. Let \(R(t)\) be a time-varying rotation matrix. Because \(R(t) \in SO(3)\), at each time \(t\) we have \(R^T(t) R(t) = I\). If we differentiate both sides of this equation, we obtain

\[\begin{split} \begin{aligned} \frac{d}{dt} \left[ R^T(t) R(t) \right] &= \frac{d}{dt} I \\ \dot{R}^T R + R^T \dot{R} &= 0 \end{aligned} \end{split}\]

In other words, \(\dot{R}^T R = -R^T \dot{R}\) and hence the matrix \(\dot{R}^T R\) is a skew symmetric matrix

\[\hat{\omega} = \dot{R}^T R\]

with \(\omega\) some three-vector. Recalling that \(R^{-1} = R^T\), we can solve this equation for \(\dot{R}\) to obtain:

\[\dot{R} = \hat{\omega} R .\]

The above gives an expression for the derivative of a rotation matrix \(\dot{R}\) that will be used below in our formulation of drone dynamics. The vector \(\omega\) is called the angular velocity, and is the same angular velocity that you may have learned in an introductory physics course.

Note that the angular velocity vector \(\omega\) is a free vector, and therefore, we can define its coordinates with respect to the coordinate frame of our choice. To understand this, it is worth reconsidering the above in the context of a specific rotation matrix, say \(R^n_b\) that specifies the orientation of body frame with respect to the navigation frame. In this case, making explicit the relevant coordinate frames, we would write

\[\dot{R}^n_b = \hat{\omega}^n_{n,b} R^n_b .\]

Here, the notation \(\hat{\omega}^n_{n,b}\) can be understood to mean that the angular velocity \(\omega\) determines the relative rate of angular rotation of the body frame with respect to the navigation frame (the “\(n,b\)” in the subscript), and that the coordinates of the angular velocity vector are given with respect to the navigation frame (the “\(n\)” in superscript). Most often, the subscript can be inferred from context, and will be omitted. If, for example, we are concerned with the rotation of the quadrotor with respect to the navigation frame, the two coordinate frames will be clear from the context, and we can reduce clutter by omitting the subscript. In contrast, we will typically keep the superscript to make clear the reference frame in which the angular velocity is defined. This will be particularly important, for example, when we express control inputs relative to the quadrotor’s body frame, with a control goal of stabilizing its orientation relative to the fixed navigation coordinate frame.

Equipped with this notation, it is easy to understand how to transform angular velocity vectors from one frame to another. If we are given \(\omega^n_{n,b}\) (i.e., the angular velocity is expressed with respect to the navigation frame), and we instead wish to specify the angular velocity with respect to the body frame, we merely perform the coordinate transformation

\[ \omega^b_{n,b} = R^b_n \omega^n_{n,b} \]

or, when the context is understood, more compactly, as

\[ \omega^b = R^b_n \omega^n \]

This transformation will be useful when expressing a desired angular velocity with respect to the body frame of a quadrotor.

7.1.6. Summary#

Now that we know how to talk about 3D rotations, 3D rigid transformations, and their rate of change, we are ready to use all that machinery to describe how drones move in 3D. The next section tackles this.

7.1.7. GTSAM 101#

A deeper dive in the GTSAM concepts used above.

The types Point3 and Rot3 are used for 3D position and rotation. While Point3 is really just an alias for a 3-dimensional numpy array, the Rot3 type is a wrapper around a c++ GTSAM type that is not just a 3x3 matrix, which is how it represents the rotation internally, but also has a number of methods to interface with them. For example, the Rot3.Ypr constructor creates a rotation from yaw, pitch, and roll angles:


Ypr(...) method of builtins.PyCapsule instance
    Ypr(y: float, p: float, r: float) -> gtsam.gtsam.Rot3

Using a 3D position and a 3D rotation we can create a 3D pose, represented by a gtsam.Pose3. As always, you can execute help(gtsam.Pose3) to get the full documentation of a class. Below is an excerpt with some useful methods. We have several constructors:

     __init__(*args, **kwargs)
     Overloaded function.
     1. __init__(self: Pose3) -> None
     2. __init__(self: Pose3, other: Pose3) -> None
     3. __init__(self: Pose3, r: Rot3, t: Point3) -> None
     4. __init__(self: Pose3, pose2: Pose2) -> None
     5. __init__(self: Pose3, mat: numpy.ndarray[numpy.float64[m, n]]) -> None

where t above is just a Point3, and mat can be a 3x4 or 4x4 matrix. However, you are responsible for these matrices to satisfy the \(SE(30\) constraints. We also used the Pose3.TransformFrom method above, and there is also a Pose3.TransformTo method:

     transformFrom(*args, **kwargs)
     Overloaded function.
     1. transformFrom(self: Pose3, point: Point3) -> Point3
     transformTo(*args, **kwargs)
     Overloaded function.
     1. transformTo(self: Pose3, point: Point3) -> Point3

Both have them have three overloads, but we show only the most commonly used one above.