LQR 教程

本文为 MuJoCo 官方 LQR 教程的简单翻译.

原文链接如下:

https://colab.research.google.com/github/deepmind/mujoco/blob/main/python/LQR.ipynb

官方GitHub 链接为 https://github.com/deepmind/mujoco.

安装 MuJoCo

1	!pip install mujoco

检查是否安装成功

#@title Check if installation was successful

from google.colab import files

import distutils.util
import subprocess
if subprocess.run('nvidia-smi').returncode:
  raise RuntimeError(
      'Cannot communicate with GPU. '
      'Make sure you are using a GPU Colab runtime. '
      'Go to the Runtime menu and select Choose runtime type.')
# Configure MuJoCo to use the EGL rendering backend (requires GPU)
print('Setting environment variable to use GPU rendering:')
%env MUJOCO_GL=egl

try:
  print('Checking that the installation succeeded:')
  import mujoco
  mujoco.MjModel.from_xml_string('<mujoco/>')
except Exception as e:
  raise e from RuntimeError(
      'Something went wrong during installation. Check the shell output above '
      'for more information.\n'
      'If using a hosted Colab runtime, make sure you enable GPU acceleration '
      'by going to the Runtime menu and selecting "Choose runtime type".')

print('Installation successful.')

其它导入包与帮助函数

#@title Other imports and helper functions
import numpy as np
from typing import Callable, Optional, Union, List
import scipy.linalg

# Graphics and plotting.
print('Installing mediapy:')
!command -v ffmpeg >/dev/null || (apt update && apt install -y ffmpeg)
!pip install -q mediapy
import mediapy as media
import matplotlib.pyplot as plt

# More legible printing from numpy.
np.set_printoptions(precision=3, suppress=True, linewidth=100)

加载与渲染标准人体模型

print('Getting MuJoCo humanoid XML description from GitHub:')
!git clone https://github.com/deepmind/mujoco
with open('mujoco/model/humanoid/humanoid.xml', 'r') as f:
  xml = f.read()

Getting MuJoCo humanoid XML description from GitHub:
Cloning into 'mujoco'...
remote: Enumerating objects: 7096, done.
remote: Counting objects: 100% (289/289), done.
remote: Compressing objects: 100% (153/153), done.
remote: Total 7096 (delta 145), reused 245 (delta 130), pack-reused 6807
Receiving objects: 100% (7096/7096), 42.89 MiB | 13.65 MiB/s, done.
Resolving deltas: 100% (5123/5123), done.

XML用于实例化 MjModel. 给定模型，我们可以创建一个 MjData 来保存仿真状态, 以及上面定义的 Renderer 类的实例.

1
2
3

model = mujoco.MjModel.from_xml_string(xml)
data = mujoco.MjData(model)
renderer = mujoco.Renderer(model)

data 对象的状态为默认配置. 让我们调用正向动力学来填充所有的派生的量(如几何体的世界坐标), 更新场景并渲染它:

1
2
3

mujoco.mj_forward(model, data)
renderer.update_scene(data)
media.show_image(renderer.render())

该模型带有一些内置的“关键帧”, 这些关键帧保存了仿真状态.

可以用 mj_resetDataKeyframe 来加载它们, 让我们看看它们长什么样:

for key in range(model.nkey):
  mujoco.mj_resetDataKeyframe(model, data, key)
  mujoco.mj_forward(model, data)
  renderer.update_scene(data)
  media.show_image(renderer.render())

现在让我们物理仿真并渲染来制作一个视频.

DURATION  = 3   # seconds
FRAMERATE = 60  # Hz

# Initialize to the standing-on-one-leg pose.
mujoco.mj_resetDataKeyframe(model, data, 1)

frames = []
while data.time < DURATION:
  # Step the simulation.
  mujoco.mj_step(model, data)

  # Render and save frames.
  if len(frames) < data.time * FRAMERATE:
    renderer.update_scene(data)
    pixels = renderer.render()
    frames.append(pixels)

# Display video.
media.show_video(frames, fps=FRAMERATE)

该模型定义了内置的力矩执行器, 我们可以通过设置 data.ctrl 向量来驱动人体模型关节. 让我们看看如果引入噪声会发生什么.

这里, 我们使用一个自定义摄像机来跟踪人体模型的质心.

DURATION  = 3   # seconds
FRAMERATE = 60  # Hz

# Make a new camera, move it to a closer distance.
camera = mujoco.MjvCamera()
mujoco.mjv_defaultFreeCamera(model, camera)
camera.distance = 2

mujoco.mj_resetDataKeyframe(model, data, 1)

frames = []
while data.time < DURATION:
  # Set control vector.
  data.ctrl = np.random.randn(model.nu)

  # Step the simulation.
  mujoco.mj_step(model, data)

  # Render and save frames.
  if len(frames) < data.time * FRAMERATE:
    # Set the lookat point to the humanoid's center of mass.
    camera.lookat = data.body('torso').subtree_com

    renderer.update_scene(data, camera)
    pixels = renderer.render()
    frames.append(pixels)

media.show_video(frames, fps=FRAMERATE)

稳定的单腿站立

显然这个初始姿态是不稳定的. 我们将尝试使用 Linear Quadratic Regulator 找到一个稳定的控制策略.

LQR 理论回顾

这个理论是由 Rudolph Kalman 在20世纪60年代提出的, 网上有很多解释这个理论的资源, 但我们只提供一个简单的概述.

给定一个状态和控制为线性的动力系统,

$x_{t+h}=Ax_t+Bu_t$

如果系统满足一个可控性条件, 就有可能以最佳方式达到稳定(驱动到 ), 如下所示. 使用两个对称正定矩阵和定义状态和控制上的二次代价函数 :

$J(x,u)=x^TQx+u^TRu$

Cost-to-go 函数 , 也称为值函数, 是未来成本的总和, 让状态从开始, 并根据动力学推演, 同时使用控制策略 :

$V^\pi(x_0)=\sum_{t=0}^\infty J(x_t,\pi(x_t))$

Kalman 的中心结论现在可以表述了. 在所有可能的控制策略中, 使得值函数最小化的最优控制策略是线性的

$\pi^*(x)=\arg\min_\pi V^\pi(x)=-Kx$

以及最优的值函数为二次的

$V^*(x)=\min_\pi V^\pi(x)=x^TPx$

矩阵满足 Riccati 方程

$P=Q+A^TPA-A^TPB(R+B^TPB)^{-1}B^TPA$

其与控制增益矩阵的关系为

$K=(R+B^TPB)^{-1}B^TPA$

理解线性化设定值

当然我们的人体模型仿真绝不是线性的. 但是, 当 MuJoCo 的 mj_step 函数计算一些非线性动力学方程时, 我们可以在任何状态-控制对附近来线性化这个函数. 简记下一个状态为 , 当前状态为以及当前控制为 , 并使用表示“微小变化”, 我们可以这样表示

$\delta y=\frac{\partial f}{\partial x}\delta x+\frac{\partial f}{\partial u}\delta u$

换句话说, 偏导数矩阵描述了对和的扰动和对的变化之间的线性关系. 与上述理论相比, 当考虑线性化动力系统时, 我们可以用转换矩阵和来表示偏导数(雅可比矩阵)矩阵:

$A=\frac{\partial f}{\partial x} \quad\quad B=\frac{\partial f}{\partial u}$

为了进行线性化, 我们需要选择一些设定值和 , 在它们附近进行线性化. 我们已经知道了 , 这是我们单腿站立的初始姿势. 但是呢? 我们如何找到可以在其附近进行线性化的“最佳”控制值?

答案是反向动力学.

通过反向动力学寻找控制设定值

MuJoCo 的前向动力学函数 mj_forward, 我们在上面使用它来传播派生量, 计算给定状态和系统中的所有力时的加速度, 其中一些是由执行器创建的.

反向动力学函数将加速度作为输入, 并计算产生加速度所需的力. 独特的是, MuJoCo 的快速反向动力学考虑到所有的约束, 包括接触. 让我们看看它是如何工作的.

我们将在目标位置的设定值处调用前向动力学, 在 data.qacc 中设置加速度为 0, 并调用逆动力学:

mujoco.mj_resetDataKeyframe(model, data, 1)
mujoco.mj_forward(model, data)
data.qacc = 0  # Assert that there is no the acceleration.
mujoco.mj_inverse(model, data)
print(data.qfrc_inverse)

[ 0. 0. 275.879 -33.186 4.995 -6.688 -4.305 3.693 -15.451 -10.906 0.412 -1.613 -9.793 -2.312 -0.366 -5.913 -0.417 -1.914 5.759 2.665 -0.202 -5.755 0.994 1.141 -1.987 3.821 1.151]

检查由反向动力学得到的力, 我们看到一些相当令人不安的东西. 有一个非常大的力作用在第三自由度(DoF), 即根关节的垂直运动自由度.

这意味着, 为了解释加速度为零的设定, 反向动力学必须创造一个直接作用于根关节的“神奇”的力. 让我们看看当我们将人体模型上下移动 1mm 时, 这个力是如何变化的, 增量为 1μm:

height_offsets = np.linspace(-0.001, 0.001, 2001)
vertical_forces = []
for offset in height_offsets:
  mujoco.mj_resetDataKeyframe(model, data, 1)
  mujoco.mj_forward(model, data)
  data.qacc = 0
  # Offset the height by `offset`.
  data.qpos[2] += offset
  mujoco.mj_inverse(model, data)
  vertical_forces.append(data.qfrc_inverse[2])

# Find the height-offset at which the vertical force is smallest.
idx = np.argmin(np.abs(vertical_forces))
best_offset = height_offsets[idx]

# Plot the relationship.
plt.figure(figsize=(10, 6))
plt.plot(height_offsets * 1000, vertical_forces, linewidth=3)
# Red vertical line at offset corresponding to smallest vertical force.
plt.axvline(x=best_offset*1000, color='red', linestyle='--')
# Green horizontal line at the humanoid's weight.
weight = model.body_subtreemass[1]*np.linalg.norm(model.opt.gravity)
plt.axhline(y=weight, color='green', linestyle='--')
plt.xlabel('Height offset (mm)')
plt.ylabel('Vertical force (N)')
plt.grid(which='major', color='#DDDDDD', linewidth=0.8)
plt.grid(which='minor', color='#EEEEEE', linestyle=':', linewidth=0.5)
plt.minorticks_on()
plt.title(f'Smallest vertical force '
          f'found at offset {best_offset*1000:.4f}mm.')
plt.show()

在上面的图中, 我们可以看到由于脚的接触而产生的强非线性关系. 在左边, 当我们把人体模型推到地板上时, 唯一能解释它没有跳出地板的原因是一个巨大的外力把它往下推. 在右边, 当我们将人体模型移离地面时, 解释加速度为零的唯一方法是有一个力向上支撑着它, 我们可以清楚地看到脚离开地面的高度, 所需的力正好等于人体模型的重量(绿线), 并且在我们不断向上移动时保持不变.

在 -0.5mm 附近是完美的高度偏移量(红线), 在这里, 零垂直加速度可以完全用关节内力来解释, 而不需要借助“神奇的”外力. 让我们修正初始姿态的高度, 将其保存在 qpos0 中, 并再次通过反向动力学计算受力:

mujoco.mj_resetDataKeyframe(model, data, 1)
mujoco.mj_forward(model, data)
data.qacc = 0
data.qpos[2] += best_offset
qpos0 = data.qpos.copy()  # Save the position setpoint.
mujoco.mj_inverse(model, data)
qfrc0 = data.qfrc_inverse.copy()
print('desired forces:', qfrc0)

desired forces: [ 0. 0. -0.191 -3.447 0.222 -0.817 2.586 14.637 -18.64 -10.906 0.412 -1.613 -9.793 -2.312 -0.366 -23.755 -2.171 12.264 26.101 13.337 -0.113 -5.755 0.994 1.141 -1.987 3.821 1.151]

好多了, 根关节上的力很小. 现在我们有了执行器可以合理地产生的力, 我们如何找到产生这些力的执行器值呢? 对于像人体模型这样的简单 motor 执行器, 我们可以简单地“除以”驱动力矩臂矩阵, 即乘以它的伪逆:

1
2
3

ctrl0 = np.atleast_2d(qfrc0) @ np.linalg.pinv(data.actuator_moment)
ctrl0 = ctrl0.flatten()  # Save the ctrl setpoint.
print('control setpoint:', ctrl0)

control setpoint: [ 0.366 0.065 -0.466 -0.273 0.01 -0.013 -0.122 -0.018 -0.116 -0.594 -0.054 0.102 0.326 -0.006 0.667 -0.288 0.05 0.029 -0.099 0.191 0.029]

更精细的执行器将需要不同的方法来恢复 , 而有限差分总是一个简单的选择.

让我们在前向动力学中应用这些控制, 并将它们产生的力与上面打印的期望力进行比较:

1
2
3

data.ctrl = ctrl0
mujoco.mj_forward(model, data)
print('actuator forces:', data.qfrc_actuator)

actuator forces: [ 0. 0. 0. 0. 0. 0. 2.586 14.637 -18.64 -10.906 0.412 -1.613 -9.793 -2.312 -0.366 -23.755 -2.171 12.264 26.101 13.337 -0.113 -5.755 0.994 1.141 -1.987 3.821 1.151]

由于人体模型是完全驱动的(除了根关节), 所需的力都在执行器的限制范围内, 我们可以看到所有内部关节的所需力完全匹配. 在根关节处还有一些不匹配，但是很小. 让我们看看当我们应用这些控制时的仿真效果:

DURATION  = 3   # seconds
FRAMERATE = 60  # Hz

# Set the state and controls to their setpoints.
mujoco.mj_resetData(model, data)
data.qpos = qpos0
data.ctrl = ctrl0

frames = []
while data.time < DURATION:
  # Step the simulation.
  mujoco.mj_step(model, data)

  # Render and save frames.
  if len(frames) < data.time * FRAMERATE:
    # Set the lookat point to the humanoid's center of mass.
    camera.lookat = data.body('torso').subtree_com
    renderer.update_scene(data, camera)
    pixels = renderer.render()
    frames.append(pixels)

media.show_video(frames, fps=FRAMERATE)

与我们上面制作的完全被动的视频相比, 我们可以看到这是一个更好的控制设定值. 人形机器人仍然摔倒了, 但它试图稳定下来, 并成功了一小会儿.

选择和矩阵

为了得到LQR反馈控制策略, 我们需要设计和矩阵. 由于线性结构, 解对于两个矩阵的缩放是不变的, 所以在不失一般性的情况下, 我们可以选择为单位矩阵:

1 2	nu = model.nu # Alias for the number of actuators. R = np.eye(nu)

的选择更为复杂. 我们将把它构造成两项的和.

首先, 是一个平衡成本, 使重心(CoM)保持在脚上. 为了描述它, 我们将使用运动学雅可比矩阵来映射关节空间和全局笛卡尔位置. MuJoCo 对这些进行了分析计算.

nv = model.nv  # Shortcut for the number of DoFs.

# Get the Jacobian for the root body (torso) CoM.
mujoco.mj_resetData(model, data)
data.qpos = qpos0
mujoco.mj_forward(model, data)
jac_com = np.zeros((3, nv))
mujoco.mj_jacSubtreeCom(model, data, jac_com, model.body('torso').id)

# Get the Jacobian for the left foot.
jac_foot = np.zeros((3, nv))
mujoco.mj_jacBodyCom(model, data, jac_foot, None, model.body('foot_left').id)

jac_diff = jac_com - jac_foot
Qbalance = jac_diff.T @ jac_diff

第二, 关节偏离其初始配置的成本, 对于不同的关节集合, 我们需要不同的系数:

自由关节的系数将为 0, 因为 CoM 成本项已经考虑到了这一点.
左腿平衡所需要的关节, 即左腿关节和水平腹部关节, 应该保持非常接近他们的初始值.
所有其他关节的系数都应该小一些, 这样人体模型就能通过摆动手臂来保持平衡.

我们求出所有这些关节集的指标.

# Get all joint names.
joint_names = [model.joint(i).name for i in range(model.njnt)]

# Get indices into relevant sets of joints.
root_dofs = range(6)
body_dofs = range(6, nv)
abdomen_dofs = [
    model.joint(name).dofadr[0]
    for name in joint_names
    if 'abdomen' in name
    and not 'z' in name
]
left_leg_dofs = [
    model.joint(name).dofadr[0]
    for name in joint_names
    if 'left' in name
    and ('hip' in name or 'knee' in name or 'ankle' in name)
    and not 'z' in name
]
balance_dofs = abdomen_dofs + left_leg_dofs
other_dofs = np.setdiff1d(body_dofs, balance_dofs)

现在我们准备构造矩阵. 注意, 平衡项的系数相当高. 这是由于3个不同的原因:

这是我们最关心的事情. 平衡意味着保持重心在脚上.
我们对 CoM (相对于身体关节)的控制权限较少.
在平衡环境中, 长度单位“更大”. 如果膝盖弯曲 0.1 弧度(≈6°), 我们可能仍然可以恢复. 如果 CoM 的位置离脚的位置有 10cm 的距离, 我们很可能正在倒向地板.

# Cost coefficients.
BALANCE_COST        = 1000  # Balancing.
BALANCE_JOINT_COST  = 3     # Joints required for balancing.
OTHER_JOINT_COST    = .3    # Other joints.

# Construct the Qjoint matrix.
Qjoint = np.eye(nv)
Qjoint[root_dofs, root_dofs] *= 0  # Don't penalize free joint directly.
Qjoint[balance_dofs, balance_dofs] *= BALANCE_JOINT_COST
Qjoint[other_dofs, other_dofs] *= OTHER_JOINT_COST

# Construct the Q matrix for position DoFs.
Qpos = BALANCE_COST * Qbalance + Qjoint

# No explicit penalty for velocities.
Q = np.block([[Qpos, np.zeros((nv, nv))],
              [np.zeros((nv, 2*nv))]])

计算 LQR 增益矩阵

在我们求解 LQR 控制器之前, 我们需要矩阵和 . 这些是由 MuJoCo 的 mjd_transitionFD 函数计算的, 该函数使用有效的有限差分导数来计算它们, 利用可配置的计算管道来避免重新计算没有变化的量.

# Set the initial state and control.
mujoco.mj_resetData(model, data)
data.ctrl = ctrl0
data.qpos = qpos0

# Allocate the A and B matrices, compute them.
A = np.zeros((2*nv, 2*nv))
B = np.zeros((2*nv, nu))
epsilon = 1e-6
centered = True
mujoco.mjd_transitionFD(model, data, epsilon, centered, A, B, None, None)

现在我们可以求解稳定控制器了. 我们将使用 scipy 的 solve_discrete_are 来求解 Riccati 方程, 并使用概述中描述的公式得到反馈增益矩阵.

# Solve discrete Riccati equation.
P = scipy.linalg.solve_discrete_are(A, B, Q, R)

# Compute the feedback gain matrix K.
K = np.linalg.inv(R + B.T @ P @ B) @ B.T @ P @ A

稳定站立

现在我们可以试试稳定控制器.

注意, 为了应用我们的增益矩阵 , 我们需要使用 mj_differentiatePos 来计算两个位置的差值. 这很重要, 因为根关节朝向是由长度为 4 的四元数给出的, 而两个四元数的差值(在切线空间中)是长度为 3 的. 在 MuJoCo 符号中, 位置(qpos)的大小为 nq, 而位置差(和速度)的大小为 nv

# Parameters.
DURATION = 5          # seconds
FRAMERATE = 60        # Hz

# Reset data, set initial pose.
mujoco.mj_resetData(model, data)
data.qpos = qpos0

# Allocate position difference dq.
dq = np.zeros(model.nv)

frames = []
while data.time < DURATION:
  # Get state difference dx.
  mujoco.mj_differentiatePos(model, dq, 1, qpos0, data.qpos)
  dx = np.hstack((dq, data.qvel)).T

  # LQR control law.
  data.ctrl = ctrl0 - K @ dx

  # Step the simulation.
  mujoco.mj_step(model, data)

  # Render and save frames.
  if len(frames) < data.time * FRAMERATE:
    renderer.update_scene(data)
    pixels = renderer.render()
    frames.append(pixels)

media.show_video(frames, fps=FRAMERATE)

最终视频

上面的视频有点令人失望, 因为这个人体模型基本上是一动不动的. 让我们来解决这个问题, 并为我们的最终产品添加一些华丽的装饰:

在 LQR 控制器上引入平滑的噪声, 使平衡动作更明显, 但不抖动.
为场景添加接触力可视化.
让摄像机平稳地绕着人体模型旋转.
实例化一个具有更高分辨率的新渲染器.

# Parameters.
DURATION = 12         # seconds
FRAMERATE = 60        # Hz
TOTAL_ROTATION = 15   # degrees
CTRL_STD = 0.05       # actuator units
CTRL_RATE = 0.8       # seconds

# Make new camera, set distance.
camera = mujoco.MjvCamera()
mujoco.mjv_defaultFreeCamera(model, camera)
camera.distance = 2.3

# Enable contact force visualisation.
scene_option = mujoco.MjvOption()
scene_option.flags[mujoco.mjtVisFlag.mjVIS_CONTACTFORCE] = True

# Set the scale of visualized contact forces to 1cm/N.
model.vis.map.force = 0.01

# Define smooth orbiting function.
def unit_smooth(normalised_time: float) -> float:
  return 1 - np.cos(normalised_time*2*np.pi)
def azimuth(time: float) -> float:
  return 100 + unit_smooth(data.time/DURATION) * TOTAL_ROTATION

# Precompute some noise.
np.random.seed(1)
nsteps = int(np.ceil(DURATION/model.opt.timestep))
perturb = np.random.randn(nsteps, nu)

# Smooth the noise.
width = int(nsteps * CTRL_RATE/DURATION)
kernel = np.exp(-0.5*np.linspace(-3, 3, width)**2)
kernel /= np.linalg.norm(kernel)
for i in range(nu):
  perturb[:, i] = np.convolve(perturb[:, i], kernel, mode='same')

# Reset data, set initial pose.
mujoco.mj_resetData(model, data)
data.qpos = qpos0

# New renderer instance with higher resolution.
renderer = mujoco.Renderer(model, width=1280, height=720)

frames = []
step = 0
while data.time < DURATION:
  # Get state difference dx.
  mujoco.mj_differentiatePos(model, dq, 1, qpos0, data.qpos)
  dx = np.hstack((dq, data.qvel)).T

  # LQR control law.
  data.ctrl = ctrl0 - K @ dx

  # Add perturbation, increment step.
  data.ctrl += CTRL_STD*perturb[step]
  step += 1

  # Step the simulation.
  mujoco.mj_step(model, data)

  # Render and save frames.
  if len(frames) < data.time * FRAMERATE:
    camera.azimuth = azimuth(data.time)
    renderer.update_scene(data, camera, scene_option)
    pixels = renderer.render()
    frames.append(pixels)

media.show_video(frames, fps=FRAMERATE)