# 1.2. Reasoning#

Reasoning provides the

intelligenceof so-called intelligent robotic systems.

Reasoning involves manipulating data and models, ultimately reaching conclusions. This can take the form of perception, in which sensor data and models are used to make inferences about the state of the world or the robot; planning, in which action models and the perceived state are used to infer a plan that can transform the state of the world or the robot from its current state to a desired state; and, learning, in which data and models are used to improve the robot’s performance over time. In this book, reasoning ranges from very simple mathematical manipulations of probabilities to learning millions of parameters for a deep neural network.

## 1.2.1. Perception#

Perception uses sensor data to drive inference about the world.

Sensors are characterized by observation models that map the state of the world to
sensor values. These models are sometimes called *forward models*.
For a given state of the world, one can use geometry and physics to uniquely determine (modulo uncertainty) the sensor output. Perception is concerned with the *inverse* problem:
Given a set of sensor measurements, infer something about the state of the world.
Inverse problems are notoriously difficult, in part because they are often ill-defined.
For example, in the noiseless case, if a range sensor is 10 meters from wall,
we can easily predict that the sensor will return \(h(x) = z = 10\).
The inverse problem is to determine something about the state of the world,
given that the sensor measurement is \(z = 10\). This is an ill-defined problem, since there are many possible world states that could lead to a measurement of \(z=10\).
The robot might be \(10\) meters from a wall, or perhaps another robot has crossed \(10\) meters in front of the sensor, or perhaps an open door \(10\) meters in front of the robot has just been closed.
Perception is concerned with these inverse problems.

In the presence of uncertainty, things become even more difficult. It is easy to illustrate the difficulty with a simple medical example. If you have the flu, there is a very high probability that you will have a fever. But if you have a fever, what is the probability that you have the flu? The latter question is complicated by the fact that there are many possible causes for a fever. Medical doctors use the presence of fever, along with other evidence (sore throat? cough? burning eyes?), combined with contextual knowledge (is it flu season? has there been an outbreak of flu in the past days?) to make a diagnosis. In other words, they perform inference by fusing multiple sources of evidence, and considering the context for the evidence.

In this book we consider two main approaches to perception: approaches based on probability theory, and approaches that apply machine learning tools such as deep neural networks. In each case, tools from optimization are often used.

The first part of the book focuses mainly on probabilistic approaches. For example, the trash sorting robot in Chapter 2 uses a maximum likelihood approach to recognize categories of recycling material. This approach essentially optimizes the value of a conditional probability function that models sensors. In Chapter 4, probabilistic models of actions and sensors are combined to estimate a probability distribution for the robot’s current state. In Chapter 6, optimization over probabilistic models is used to simultaneously construct a map and localize the robot within the map (the SLAM problem). In each of these cases, the reasoning and models are explicit.

We also describe how neural networks can be used to solve perception problems. Neural networks do not use explicit models. Instead, their reasoning process is essentially encoded as a set of weights on the connections in the network. Choosing the values for these weights is typically a large-scale optimization problem that is driven by available data. In a sense, neural networks learn a solution to inverse probles. In Chapter 5, we introduce neural networks for computer vision problems, and in Chapter 7 we extend these ideas to neural radiance fields (NeRF).

## 1.2.2. Planning#

Planning is the process of determining which actions to execute in order to effect desired changes in the world.

Planning can mean very different things, depending on the complexity of the task, the time scale for actions, the uncertainty involved, and the complexity of the robot. For simple tasks that can be characterized at a fairly high level of abstraction, symbolic descriptions can be used, and planning can often be reduced to either a decision-theoretic problem, or to a graph search problem. This is the case for both the trash sorting robot and the vacuum cleaning robot.

For the trash sorting robot, planning reduces to making individual decisions about which bin is appropriate for a piece of trash. For the vacuum cleaning robot, planning takes the form of constructing an optimal policy: Given a specific goal location, determine the optimal action to apply for each possible robot state, such that the robot makes its way to the goal.

As we move down the abstraction hierarchy, from high-level task planning to low-level control, planning requires increasing computation, and begins to take into consideration both the geometry and the physics of the problem at hand. For example, when the dynamics of the task are important (as is the case for drones in Chapter 7), planning must consider the dynamic equations of the system. If uncertainty is involved, it is impossible to provide deterministic performance guarantees, and planning requires dealing with probabilistic outcomes. We will consider all of these cases in this book.

### 1.2.2.1. Task Planning#

Task planning treats problems at a fairly high level of abstraction, ignoring details such
as the geometry of the environment, the specific path traversed by the robot,
and any consideration of dynamics (e.g., for drones that depend on aerodynamic properties to stay aloft).
Actions in a task plan are typically described in high-level, symbolic terms,
such as *move the piece of trash to the paper bin* for our trash sorting robot,
or *go to the kitchen* for our vacuum cleaning robot.
A task planner makes its decisions by considering the effects of actions at this high level.
A task planner might consider costs associated to placing a piece of trash in the wrong bin,
or probabilities associated to arriving to the correct room,
but it will not consider details such as how to grasp the item of trash or what should be the motor torques on the wheels of a robot to move it from the living room to the kitchen.

Task planning is useful when either (a) the robot has a set of basic skills, sometimes called *primitives* or *motion primitives* that can be executed reliably, and therefore need not be planned, or (b) when the task planner is integrated into an overall system that incorporates motion planning (this integrated problem is sometimes called *TAMP*, for Task And Motion Planning). The trash sorting robot and the vacuum cleaning robot are examples of the former.

### 1.2.2.2. Path Planning#

Path planning deals with the problem of moving the robot from one position to another. Avoiding collisions with obstacles in the environment (including humans and other robots) is generally a central concern for path planners.

An easy way to address this problem is to discretize the world into a grid and then label grid cells as either being free or as containing an obstacle. The path planning problem is then to find a sequence of free grid cells such that the initial robot position is contained in the starting cell and the goal position is contained in the final cell. The disadvantage to this approach is that the representation of free space is conservative. If a cell is partially blocked by an obstacle, then the entire cell will be treated as an obstacle and therefore be unavailable to the robot; there may be cases for which a collision-free path exists in the world, but not in the grid. In such cases, the robot will be unable to find a free path, even though it exists.

For many tasks, grid-based approximations are not sufficiently accurate. Furthermore, the size of the required grid grows exponentially with the number of degrees of freedom for the system. A robot that moves in the plane might use a \(100 \times 100\) grid to represent the position of the robot. If we decide to consider also the orientation of the robot, also using \(100\) discretization levels, the grid grows to \(100 \times 100 \times 100\). If we consider a drone and represent its 3D position and 3D orientation, the grid grows to \(100^6\), a very large data structure, for a fairly coarse quantization of the drone’s configuration space.

For the problem of planning a collision-free path for a robot that can move in arbitrary directions, there exist exact algorithms that are guaranteed to find a solution when one exists, or to terminate in finite time when no solution exists (these algorithms are called *complete* algorithms). Unfortunately, even the fastest of these algorithms has computational complexity that increases exponentially with the number of degrees of freedom of the robot (in addition to being extremely difficult to implement). For this reason, much research in path planning has focused on computationally efficient algorithms that perform well in practice, even though deterministic performance guarantees are not available. Randomized, sampling-based algorithms are the most popular of these, and we will see some of these in Chapter 5, to plan paths for our DDR.

### 1.2.2.3. Trajectory Planning#

While path planning considers the “shape” of the path from start to goal (e.g., consider the smoke trail that would be left by a sky-writing drone), trajectory planning also considers the time parameterization of that path (how fast the drone traverses the path).

In this book, dynamic effects become important in Chapter 7, where we consider a drone robot. Aerodynamics can be computationally difficult; to achieve ultimate accuracy requires solving systems of partial differential equations. Therefore, we will use numerical approximations for the aerodynamic effects and solve a trajectory optimization problem that considers performance criteria that include obstacle avoidance, start and goal states (i.e., boundary value specification), and smoothness (e.g., bounding velocity and acceleration). For completeness, we also introduce a control scheme that can execute the final trajectory on an actual drone.

## 1.2.3. Learning#

By learning from data we can make robots smarter.

For many applications, it is impossible to program a robot for all situations that might occur. In some cases, this is due to the fact that arbitrarily many possible situations could occur. In other applications, things may simply be too complex to model exactly. In still other cases, it is simply not possible to know a priori all of the conditions that might confront the robot.

Machine learning approaches can be used in these situations to adapt to unknown or changing circumstances, or to improve the robot’s performance over time. In this book, we will consider three main types of machine learning: simple statistical analysis, reinforcement learning, and neural network-based learning.

For systems that have a few unknown parameters, it is often sufficient to use statistical methods to estimate the values of those parameters. For example, if the trash sorting robot does not know the prior probabilities for the different categories of trash that arrive to the sorter, these probabilities can be estimated by observing the system over time. If a robot manipulates an unknown object, it can estimate the mass of that object by using measurements from it’s sensors. If sensors are noisy, but the parameters of their associated noise models are not known, we can estimate these through calibration, which is also a kind of learning.

For systems that will operate over long periods of time, it is often possible to improve performance using experience.
*Reinforcement learning* is a popular approach. Simply stated, behaviors that lead to successful outcomes are rewarded,
reinforcing successful behavior. In Chapter 3, we introduce reinforcement learning for the vacuum cleaning robot,
including the popular variation known as *Q-learning*.
Q-learning does not explicitly learn which actions are most effective, but instead learns a function that models
the long-term reward for applying actions in specific states.

For problems that involve large data sets, neural network-based methods, particularly *deep learning* can often provide effective solutions. Neural networks can be viewed as function approximators. For the computer vision problem of object recognition, the approximated function takes an image as input and outputs a classification of the object. Many other vision tasks, such as image segmentation, scene depth, and scene motion can be solved using modern neural networks.

Neural networks are constructed from simple processing nodes that are interconnected by weighted edges. The weights effectively determine the computation done by the network, and therefore the key problem in neural networks is that of learning the weights.
We introduce neural networks, and then deep neural networks, in Chapter 5, for problems in image processing and computer vision.
Then, in Chapter 6, we combine Deep Learning with Reinforcement Learning, resulting in an approach called *Deep Reinforcement Learning,*
or *Deep RL.* Finally, in Chapter 7, we use the same differentiable optimization methods for modeling the environment, when we talk about Neural Radiance Fields (NeRFs) and their use in autonomous flight.