Active Vision was proposed already in 1988 by Aloimonos et al. In Active Vision, an observer can change its vantage point to obtain information most relevant to a specific task. This is in contrast to the usual perception problem of analysing a given sensor stream without having a say in how it is generated.
In Active Perception, Bajcsy defines the concept of active perception as "a study of modeling and control strategies for perception ... The control strategies are formulated as a search of such sequences of steps that would minimize a loss function while still seeking the most information".
More recently, in Revisiting Active Perception, Bajcsy gave a broader definition of active perception as: "An agent is an active perceiver if it knows why it wishes to sense, and then chooses what to perceive,and determines how, when and where to achieve that perception".
Active perception methods give the observer control over what and how the world is perceived. It is quite intuitive to move around an obstacle if we want to see what's behind it. But what do we do if the obstacle is a closed door?
Interactive Perception, coined by Dov Katz and Oliver Brock in "Manipulating Articulated Objects with Interactive Perception", goes one step beyond active perception: "it allows the observer to manipulate the environment to obtain task-relevant information. Due to this coupling of perception and physical interaction, we refer to the general approach as interactive perception".
Changing the state of the world in service of perception is a powerful concept. There are only two examples of applying this concept before Katz's work. Christiansen et al. determine a model of an object’s dynamics by observing its motion in response to deliberate interactions. Specifically, they placed an object on a tray that could be titled and observed the motion of the object in response to different tilts. This is a great example of how a simple action (tilt) makes perception dramatically easier.
Similarly, In "Grounding vision through experimental manipulation" Fitzpatrick and Metta extend the concept of active segmentation into interactive segmentation. It is well known that segmenting a moving object from its background is a relatively simple task compared to object segmentation in a static scene. It requires almost no prior knowledge and very little computation. Interactive segmentation is simply the process of deliberatly generating that object motion through poking.
In the above examples, interactive perception changes the extrinsics of an object to enable motion modeling or object segmentation. In "Manipulating Articulated Objects with Interactive Perception", Dov Katz and Oliver Brock proposed interactive perception to model the function of an object --- a critical step in realizing robots that can interact intelligently with the world and perform useful work. Katz and Brock use interactive perception to model the kinematic structure of unknown objects with no prior knowledge.
Robots that know how to manipulate articulated objects could perform meaningful tasks. The degrees of freedom of an articulated object are often relevant to the object’s intended function. Examples of everyday articulated objects include scissors, pliers, doors, door handles, books, and drawers. Modeling the degrees of freedom of a novel object is a daunting task for traditional vision approaches. It is often difficult to tell where one rigid body ends and the other begins. And, it is practically impossible to know whether what looks like a joint can actually be actuated.
In "Manipulating Articulated Objects with Interactive Perception", Dov Katz and Oliver Brock introduced Interactive Perception to this problem. They used robot manipulation to generate the signal necessary to model the object's structure. Specifically, through a sequence of interactions with an object, the robot builds a model of its degrees of freedom. Then, the robot uses the acquired model to manipulate the object into a desired configuration.
Katz et al. continued to develop this idea and extended it from planar objects to general 3D objects .
Finally, in Katz et al. began to tackle the problem of efficiently generating the exploratory sequence of actions. While allowing interaction as part of the perception process is a powerful idea, knowing where and how to interact is almost as difficult as the original problem. For example, a robot could tap a doorknob for hours without ever trying an action that moves the handle with respect to the door. Katz et al. used Relational Reinforcement Learning to guide exploration. They show that robots can learn to explore articulated objects and become better with experience. They also show that knowledge can be transfered across objects, creating a compelling approach for developing life-long learning of the kinematic structure of objects.
In the decade since Katz and Brock's paper, many researchers have leveraged the ideas of interactive perception. Some notable examples include Hausman, Pangercic et al. on using interactive perception to segment textured and textureless objects. Agrawal et al. used interactive perception to learn intuitive physics. In Koval et al. interactive perception is used for grasp planning, and Romano and Kuchenbecker leverage interactive perception to estimate the haptic properties of objects.
There are many more exciting applications for interactive perception. For an excellent survey of the field check out Bohg et al.
Copyright Dov Katz / Dubi Katz /
דב כץ / דובי כץ
Dov Katz' reading list on Interactive Perception
All rights reserved.