Thoughts on Dexterous Manipulation Tasks

2025-05-04T00:00:00+00:00

🖐️ Two Types of Dexterous Manipulation Tasks: Functional Grasping vs. In-Hand Manipulation

In my recent research and experimentation on dexterous manipulation, I’ve found that tasks in this domain can generally be divided into two major categories. These differ in objectives, strategy requirements, and especially in physical modeling. In this post, I attempt a preliminary classification and analysis of the two, along with some personal reflections and questions.

📌 1. Task Categorization

Based on existing literature and practical trials, dexterous manipulation tasks can be roughly divided into:

1. Functional Grasping

The goal is for the robot to grasp an object stably for use in downstream interaction tasks, such as using tools, transporting items, or lifting containers. Typical examples include:

Grasping a cup and moving it to a target location
Holding a screwdriver in preparation to tighten a screw
Firmly holding a pen to write

The core of functional grasping is stability, often evaluated via force-closure constraints or other grasp quality metrics. The main concern is whether the grasp can resist external perturbations—for instance, whether the object remains stable when external forces are applied after grasping.

An interesting example is using a pen to write. This task inherently involves both grasping and manipulation, but few systems today can perform this kind of stable interaction.

2. In-Hand Manipulation

This task type involves dynamic adjustment. The goal is to perform fine pose tuning of an object within the hand through coordinated control of high-DOF fingers, often as a prerequisite for later tasks. Examples include:

Rotating a screw cap in-hand to align with a threaded hole
Adjusting the orientation of a pen after picking it up to a writing-ready pose
Rotating a cube in-palm to a target orientation

Compared to functional grasping, this task is more dynamic and complex, often involving finger gaiting, sliding contacts, or even control of inertial responses within the hand. A simplified version often used in RL research is rotating a cube to a target orientation — a popular benchmark today.

⚙️ 2. Reflections on Physical Modeling

These two task types differ significantly in terms of physical constraints and policy design:

For Functional Grasping:

Introducing force-closure or wrench-based metrics into optimization or reward design is natural and effective;
The main focus is grasp stability, especially under external contact;
Physical modeling is largely quasi-static;
Some interesting downstream tasks (e.g., writing) are highly sensitive to grasp stability, and incorporating physical modeling can significantly improve policy performance.

For In-Hand Manipulation:

This is a more dynamic process;
A major challenge is modeling and controlling object responses such as inertia and contact forces (which is also a tough problem for model-based control);
While physical constraints are necessary, they’re not as directly effective as in grasping. Instead, we need integration of contact dynamics modeling and complex control strategies;
There’s still no effective method for real-time pose adjustment based on tactile feedback.

🧩 Summary

Category	Functional Grasping	In-Hand Manipulation
Goal	Stable grasp for downstream tasks	Fine in-hand pose adjustment
Characteristics	Quasi-static, force closure, stability	Dynamic, finger gaiting, dynamic modeling
Challenge	Evaluating and optimizing grasp quality	Modeling contact dynamics and regrasping
Examples	Writing, transporting, tool usage	Rotating cap, pen adjustment, cube reorientation

The second type of task demands richer dynamic contact information. If the robotic hand is equipped with quality tactile sensors that can provide contact-rich feedback, then the policy could leverage this to more effectively control object dynamics.

🖐️ 关于灵巧操作的两类任务思考：Functional Grasping vs. In-Hand Manipulation

在我最近对 dexterous manipulation 的调研与实践中，我发现这类任务能够分成了两大类，它们在目标、策略需求上都展现出一定的差异，尤其是物理建模上。这里我尝试对它们做一个初步的分类与分析，也记录一些我目前的理解与疑问。

📌 一、任务类型划分

从已有文献和实际尝试来看，dexterous manipulation 的任务大致可以分为以下两类：

1. Functional Grasping（功能性持握 / 稳定抓取）

目标是让机器人稳定地抓起物体，并用于后续的交互任务，如使用工具、搬运物体、举起容器等。典型任务如：

抓住一只杯子将其移动至指定位置
拿起一把螺丝刀准备拧紧螺丝
稳定持握一支笔进行写字

Functional Grasping 的核心在于稳定性（stability），可以借助 force-closure constraints 或其他 grasp evaluation metrics 来评估一个抓取的“质量”。这一任务关注的是能否抓住并抵抗外部扰动，比如机器人持握物体后，能否在遇到外部力施加时仍保持稳定。

比较有意思的例子是机器人用笔写字——这本质上既是 grasp 也是 manipulation。目前很少有稳定的交互。

2. In-Hand Manipulation（手中操作 / 灵巧操作）

这类任务更偏向于一个动态调整的过程。目标是通过高自由度手指之间的协调，对物体进行位姿上的微调，以便于后续任务执行。典型示例包括：

手中旋转一个螺丝帽以对齐螺纹孔
拿起一支笔后在指间调整至便于书写的姿态
将一个小方块在掌中旋转至目标姿态

相比 Functional Grasping，这类任务更复杂、更动态，通常涉及 finger gaiting、滑动接触、甚至控制物体在手内的惯性响应。因此，policy 本身需要对 object dynamics 有更深的建模和理解。一个常用的简化版本任务是旋转一个 cube 至目标姿态 —— 这是当前 RL manipulation 研究中较为流行的 benchmark。

⚙️ 二、关于物理建模的思考

这两类任务在 物理建模（physical constraints） 和 策略设计（policy learning） 上有着显著不同：

对于 Functional Grasping：

引入 force-closure 或 wrench-based metrics 作为 optimization/reward，是较自然、直接的方式；
重点在于提升 grasp 稳定性，尤其是在面对 external contact 时能否维持持握；
物理建模更偏向于 quasi-static；
一些 cool 的 downstream task（如书写）需要对 grasp 稳定性极为敏感，而这可以通过物理建模来提升策略性能。

对于 In-Hand Manipulation：

是一个更加动态（dynamic）的过程；
难点在于对物体的动态响应（如转动惯量、接触力滑动）进行建模和控制（这一点也是model-based control比较困难的点）；Learning的policy也有这样一个
引入物理约束虽然必要，但效果不如在 grasping 中那样“直接有效”，更需要融合 contact dynamics 模型与更复杂的 control strategy；
如何基于 tactile feedback 做在线姿态调整目前还没有很好的方法。

🧩 总结

类别	Functional Grasping	In-Hand Manipulation
目标	稳定持握用于后续交互	手中物体精细姿态调整
特征	Quasi-static, 力闭合, 稳定性为主	Dynamic, finger gaiting, 动态建模
难点	如何评估 grasp 质量并优化持握稳定性	如何建模 contact dynamics 和控制再定位策略
示例	写字、搬运、使用工具	转动螺丝帽、笔姿态调整、cube reorientation

第二类task对于动态的contact信息获取要求更高一些，如果hand有好的sensor来提供contact-rich information, 那policy应该能够利用这些信息来控制object的dynamics.

Learning Stability: Reflections on Physical Constraints in Dexterous Manipulation

2025-05-02T00:00:00+00:00

Over the past few months, I’ve been exploring how physical constraints—like force closure and contact stability—can be introduced directly into the learning process for in-hand manipulation. While most recent progress in dexterous robotic skills has focused on end-to-end reinforcement learning using vision or tactile images, I found myself wondering: what if we teach policies to care about physical grasp stability?

Stability isn’t just a side effect

Most in-hand policies today can spin a cube, but they don’t really hold it. Often, the object is precariously balanced on a fingertip or nudged by the palm, barely resisting even small external disturbances. In contrast, when humans manipulate objects, we instinctively adjust our grasp in response to micro-contact cues—especially force and torque at our fingertips.

That idea became the starting point of this workshop paper. Instead of treating grasp stability as a side effect of task success, we made it part of the objective: using force-torque sensing and force-closure-based rewards, we trained a policy that explicitly seeks to maintain a physically stable grasp during in-hand manipulation.

What we learned

It turns out, adding physical constraints had some interesting effects:

Faster convergence: The policy trained with grasp stability rewards reached functional behavior in far fewer steps.
Smoother behavior: The hand didn’t just spin the cube—it cradled it, adjusted to shifting forces, and moved more like a hand, not a claw.
Trade-offs emerged: Stability and dexterity aren’t always aligned. By rewarding force closure, the policy occasionally sacrificed rotational speed to keep its grip secure.

Looking ahead

This is still early-stage work—just a workshop paper—but the idea of integrating physically meaningful objectives into RL for manipulation feels promising. In future work, I’m curious to see how these strategies scale with more complex objects, external perturbations, and real-world sim-to-real deployment.

Maybe one day, we’ll build not just policies that solve tasks, but hands that understand what it means to hold on.

Yifei Chen’s Website