Learning Stability: Reflections on Physical Constraints in Dexterous Manipulation

1 minute read

Published:

Over the past few months, I’ve been exploring how physical constraints—like force closure and contact stability—can be introduced directly into the learning process for in-hand manipulation. While most recent progress in dexterous robotic skills has focused on end-to-end reinforcement learning using vision or tactile images, I found myself wondering: what if we teach policies to care about physical grasp stability?

Stability isn’t just a side effect

Most in-hand policies today can spin a cube, but they don’t really hold it. Often, the object is precariously balanced on a fingertip or nudged by the palm, barely resisting even small external disturbances. In contrast, when humans manipulate objects, we instinctively adjust our grasp in response to micro-contact cues—especially force and torque at our fingertips.

That idea became the starting point of this workshop paper. Instead of treating grasp stability as a side effect of task success, we made it part of the objective: using force-torque sensing and force-closure-based rewards, we trained a policy that explicitly seeks to maintain a physically stable grasp during in-hand manipulation.

What we learned

It turns out, adding physical constraints had some interesting effects:

  • Faster convergence: The policy trained with grasp stability rewards reached functional behavior in far fewer steps.
  • Smoother behavior: The hand didn’t just spin the cube—it cradled it, adjusted to shifting forces, and moved more like a hand, not a claw.
  • Trade-offs emerged: Stability and dexterity aren’t always aligned. By rewarding force closure, the policy occasionally sacrificed rotational speed to keep its grip secure.

Looking ahead

This is still early-stage work—just a workshop paper—but the idea of integrating physically meaningful objectives into RL for manipulation feels promising. In future work, I’m curious to see how these strategies scale with more complex objects, external perturbations, and real-world sim-to-real deployment.

Maybe one day, we’ll build not just policies that solve tasks, but hands that understand what it means to hold on.