gtsam/doc/PythonReference/Pose3.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pose3\n",
    "\n",
    "The `Pose3` class implements two things: a pose manifold and a Lie group of transforms. They look awfully similar, and are often treated as the same thing, but they are really not."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "html"
    }
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/borglab/gtsam/blob/develop/doc/PythonReference/Pose3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "markdown"
    }
   },
   "outputs": [],
   "source": [
    "%pip install gtsam"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "vscode": {
     "languageId": "markdown"
    }
   },
   "outputs": [],
   "source": [
    "import gtsam\n",
    "import numpy as np\n",
    "from gtsam import Pose3, Rot3, Point3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Manifold $\\mathcal{SE}(3)$\n",
    "The manifold $\\mathcal{SE}(3)$ is defined by elements $(C, r)$, where:\n",
    "- $C \\in \\mathcal{SO}(3)$: The orientation or **attitude**.\n",
    "- $r \\in \\mathbb{R}^3$: The **position**.\n",
    "\n",
    "This manifold represents poses in 3D space, with $C$ encoding orientation/attitude and $r$ encoding the position of a body in some reference frame. \n",
    "\n",
    "This manifold can be thought of all possible values that can specify the position and orientation of a rigid body in some reference frame. It does not have a notion of composition: we would never think to *compose* two poses of two different rigid bodies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The simplest thing you can do in GTSAM is just creating a pose with identity orientation and position:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "vscode": {
     "languageId": "markdown"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "P=\n",
      "R: [\n",
      "\t1, 0, 0;\n",
      "\t0, 1, 0;\n",
      "\t0, 0, 1\n",
      "]\n",
      "t: 0 0 0\n",
      "\n"
     ]
    }
   ],
   "source": [
    "P = Pose3()\n",
    "print(f\"P=\\n{P}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But of course, it is more useful to specify some orientation and position. To do that, you need to create a `Rot3` instance, see [Rotations in 3D](Rot3.ipynb). For example, here is a pose of a robot at $r=(3,4,0)$ and rotated by 30 degrees around the Z-axis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "vscode": {
     "languageId": "markdown"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "R: [\n",
      "\t0.866025, -0.5, 0;\n",
      "\t0.5, 0.866025, 0;\n",
      "\t0, 0, 1\n",
      "]\n",
      " [3. 4. 0.]\n",
      "P=\n",
      "R: [\n",
      "\t0.866025, -0.5, 0;\n",
      "\t0.5, 0.866025, 0;\n",
      "\t0, 0, 1\n",
      "]\n",
      "t: 3 4 0\n",
      "\n"
     ]
    }
   ],
   "source": [
    "C = Rot3.Yaw(np.deg2rad(30))\n",
    "r = Point3(3,4,0)\n",
    "print(C,r)\n",
    "P = Pose3(C,r)\n",
    "print(f\"P=\\n{P}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A great many things can be done using just the manifold $\\mathcal{SE}(3)$, as it provides a flexible and foundational framework for representing and reasoning about poses in 3D space. Below are some key use cases and examples:\n",
    "\n",
    "### Trajectories on $\\mathcal{SE}(3)$\n",
    "A trajectory over time can be represented as a continuous or discrete curve on the manifold:\n",
    "$$\n",
    "P(t) = (C(t), r(t)),\n",
    "$$\n",
    "where:\n",
    "- $C(t)$ is a time-varying orientation.\n",
    "- $r(t)$ is a time-varying position vector.\n",
    "\n",
    "This representation is useful for describing motion paths of rigid bodies, such as robots, drones, or vehicles.\n",
    "### Optimization on $\\mathcal{SE}(3)$\n",
    "Many problems involve optimization over the pose manifold, such as:\n",
    "- **Localization**: Estimating the pose of a robot or sensor in a global or local frame.\n",
    "- **SLAM (Simultaneous Localization and Mapping)**: Optimizing a graph of poses and landmarks to minimize error."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lie Group $SE(3)$\n",
    "The Lie group $SE(3)$ consists of elements $(R, t)$, where:\n",
    "- $R \\in SO(3)$: A **rotation** matrix.\n",
    "- $t \\in \\mathbb{R}^3$: A **translation** vector.\n",
    "\n",
    "The group operation in $SE(3)$ is defined as:\n",
    "$$\n",
    "(R_1, t_1) \\cdot (R_2, t_2) = (R_1 R_2, R_1 t_2 + t_1).\n",
    "$$\n",
    "This operation combines two transformations, where $R_1 R_2$ composes the rotations and $R_1 t_2 + t_1$ composes the translations.\n",
    "\n",
    "> Note that this particular form of composition *defines* $SE(3)$ as the *semi-product* of the group $SO(3)$ and the vector space $\\mathbb{R}^3$. This should be contrasted with a direct product, which would yield $(R_1 R_2, t_2 + t_1)$ as the composition.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Group Action on 3D Points\n",
    "The group $SE(3)$ acts on a 3D point $p \\in \\mathbb{R}^3$ as follows:\n",
    "$$\n",
    "p' = (R, t) \\cdot p = R p + t.\n",
    "$$\n",
    "Here, the rotation $R$ is applied first, followed by the translation $t$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Group Action on Poses\n",
    "The group $SE(3)$ also acts on a pose $(C,r) \\in \\mathcal{SE}(3)$ as follows:\n",
    "$$\n",
    "(R,t) \\cdot (C,r) = (R C, R r +t).\n",
    "$$\n",
    "- $R C$ rotates the orientation into a different frame.\n",
    "- $R r +t$ transforms the position, by first rotating the position in the old frame, then translating to account for the origin $t$ of the new frame."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4x4 Matrix Lie Group View of Transforms\n",
    "\n",
    "A transform in $SE(3)$ can also be represented as a $4 \\times 4$ homogeneous transformation matrix:\n",
    "$$\n",
    "T =\n",
    "\\begin{bmatrix}\n",
    "R & t \\\\\n",
    "0^\\top & 1\n",
    "\\end{bmatrix},\n",
    "$$\n",
    "\n",
    "The composition of two transforms $T_1$ and $T_2$ in $SE(3)$ is then simply matrix multiplication\n",
    "$$\n",
    "T_1 \\cdot T_2 =\n",
    "\\begin{bmatrix}\n",
    "R_1 & t_1 \\\\\n",
    "0^\\top & 1\n",
    "\\end{bmatrix}\n",
    "\\begin{bmatrix}\n",
    "R_2 & t_2 \\\\\n",
    "0^\\top & 1\n",
    "\\end{bmatrix}\n",
    "=\n",
    "\\begin{bmatrix}\n",
    "R_1 R_2 & R_1 t_2 + t_1 \\\\\n",
    "0^\\top & 1\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "and the inverse is given by:\n",
    "$$\n",
    "T^{-1} =\n",
    "\\begin{bmatrix}\n",
    "R^\\top & -R^\\top t \\\\\n",
    "0^\\top & 1\n",
    "\\end{bmatrix},\n",
    "$$\n",
    "where $R^\\top$ is the transpose (and also the inverse) of the rotation matrix $R$.\n",
    "\n",
    "To implement the action of $SE(3)$ on $\\mathbb{R}^3$ we need to embed a point $p$ in homogeneous coordinates as $\\tilde{p} = [p^\\top, 1]^\\top$, where $p \\in \\mathbb{R}^3$. The action of a transform $T$ on a point $\\tilde{p}$ is then given by:\n",
    "$$\n",
    "\\tilde{p}' = T \\tilde{p} =\n",
    "\\begin{bmatrix}\n",
    "R & t \\\\\n",
    "0^\\top & 1\n",
    "\\end{bmatrix}\n",
    "\\begin{bmatrix}\n",
    "p \\\\\n",
    "1\n",
    "\\end{bmatrix}\n",
    "=\n",
    "\\begin{bmatrix}\n",
    "R p + t \\\\\n",
    "1\n",
    "\\end{bmatrix} = \\begin{bmatrix}\n",
    "p' \\\\\n",
    "1\n",
    "\\end{bmatrix}.\n",
    "$$\n",
    "\n",
    "The matrix representation provides an elegant way to handle transformations in 3D space. This view also emphasizes that $SE(3)$ is a **matrix Lie group**, i.e., a subgroup of invertible $d\\times d$ matrices, with in this case $d=4$. However, it is by no means required to implement the Lie group $SE(3)$ and in fact GTSAM internally does *not* use $4\\times 4$ matrices."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reference Frame Hierarchies\n",
    "\n",
    "Poses can sometimes be used as transforms when hierarchical relationships exist between frames. For example, consider the pose $P^b_i$ of an IMU relative to a rigid body B, which itself has a pose $P^n_b$ in a navigation frame N.\n",
    "\n",
    "If we seek is the pose $P^n_i$ of the IMU relative to the navigation frame, we need to *upgrade* the body pose $P^n_b$ to a rigid transform $T^n_b$. The pose of the IMU relative to the navigation frame is then given by:\n",
    "$$\n",
    "P^n_i = T^n_b \\cdot P^b_i.\n",
    "$$\n",
    "Note how the $b$ indices cancel out."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retractions and the Exponential Map\n",
    "\n",
    "### The Tangent Space of $SE(3)$\n",
    "\n",
    "The tangent space of $SE(3)$ at the identity, known as its **Lie algebra** $\\mathfrak{se}(3)$, can be thought of informally as encoding **angular and linear velocities**. Elements of $\\mathfrak{se}(3)$ are twists $\\xi = (\\omega, v)$, where:\n",
    "- $\\omega \\in \\mathbb{R}^3$: Represents angular velocity.\n",
    "- $v \\in \\mathbb{R}^3$: Represents linear velocity.\n",
    "\n",
    "These quantities describe instantaneous motion in 3D space. A twist $\\xi$ can be represented as a skew-symmetric matrix:\n",
    "$$\n",
    "\\xi^\\wedge =\n",
    "\\begin{bmatrix}\n",
    "\\omega^\\wedge & v \\\\\n",
    "0^\\top & 0\n",
    "\\end{bmatrix},\n",
    "$$\n",
    "where $\\omega^\\wedge$ is the skew-symmetric matrix of $\\omega$, encoding rotational motion.\n",
    "\n",
    "To describe **finite motions**, the twist $\\xi$ is scaled by a time or displacement parameter $\\Delta t$, resulting in a scaled twist $\\xi \\Delta t = (\\phi, \\rho)$:\n",
    "- $\\phi = \\omega \\Delta t$: A finite angular displacement.\n",
    "- $\\rho = v \\Delta t$: A finite linear displacement.\n",
    "\n",
    "We used $\\Delta t$ to indicate a *time* interval: not this has nothing to do with the *translation* $t$. This finite motion $\\xi \\Delta t$ is the input to the **exponential map**, which we explain next."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Exponential Map for $SE(3)$\n",
    "\n",
    "The **exponential map** for $SE(3)$ takes a finite motion $\\xi \\Delta t = (\\phi, \\rho)$, derived from a twist $\\xi = (\\omega, v)$ scaled by a parameter $t$, and maps it to a rigid body transform $(R, t) \\in SE(3)$. \n",
    "The exponential map produces a transformation $(R, t)$ in $SE(3)$:\n",
    "$$\n",
    "\\exp((\\xi \\Delta t)^\\wedge) =\n",
    "\\begin{bmatrix}\n",
    "R & t \\\\\n",
    "0^\\top & 1\n",
    "\\end{bmatrix},\n",
    "$$\n",
    "or $(R,t)\\in SE(3)$, where\n",
    "- $R = \\exp(\\phi^\\wedge)$, where $\\phi^\\wedge$ is the skew-symmetric matrix of $\\phi$.\n",
    "- $t = J_l(\\phi) \\rho$, where $J_l(\\phi)$ is the *left Jacobian* of $SO(3)$, a $3 \\times 3$ matrix. $J_l(\\phi)$ handles the semi-direct product nature of $SE(3)$ by coupling the rotational and translational components appropriately.\n",
    "\n",
    "An often-used shorthand notation is defining $\\text{Exp}: \\mathbb{R}^6 \\rightarrow SE(3)$ or a two-argument variant as:\n",
    "$$\n",
    "\\text{Exp}(\\phi,\\rho) = \\text{Exp}(\\xi \\Delta t) \\doteq \\exp((\\xi \\Delta t)^\\wedge) = (\\text{Exp}(\\phi), J_l(\\phi) \\rho)\n",
    "$$\n",
    "where we used the same notation for $SO(3)$, i.e., $\\text{Exp}(\\phi)\\doteq \\exp(\\phi^\\wedge)$.\n",
    "\n",
    "### Retractions to the Manifold $\\mathcal{SE}(3)$\n",
    "\n",
    "The exponential map can also be used as a **retraction** for optimization on the $SE(3)$ manifold. Retractions are mappings from the tangent space to the manifold, e.g., the exponential map implements the following retraction:\n",
    "$$\n",
    "(C, r) \\times (\\phi, \\rho) \\mapsto \\exp((\\xi \\Delta t)^\\wedge) \\cdot (C,r) = (C \\exp(\\phi^\\wedge), r + C J_l(\\phi) \\rho),\n",
    "$$\n",
    "\n",
    "Alternative retractions exists, however, and the only restriction on them is that they need to have the same behavior as the exponential map near the identity. One alternative retraction simply treats the attitude and position separately:\n",
    "$$\n",
    "(C, r) \\times (\\phi, \\rho) \\mapsto (C\\exp(\\phi^\\wedge), r + \\rho),\n",
    "$$\n",
    "This alternative retraction behaves identically to the exponential map at $t = 0$ and is computationally simpler."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "markdown"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "gtbook",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}