Life is a game, take it seriously

How objects are represented in human brain? Structural description models versus Image-based models

In Computer Vision, Neural Science, Paper Talk on October 30, 2014 at 9:06 pm

by Li Yang Ku (Gooly)

poggio

A few years ago while I was still back in UCLA, Tomaso Poggio came to give a talk about the object recognition work he did with 2D templates. After the talk some student asked about whether he thought about using a 3D model to help recognizing objects from different viewpoints. “The field seems to agree that models are stored as 2D images instead of 3D models in human brain” was the short answer Tomaso replied. Since then I took it as a fact and never had a second thought of it till a few month ago when I actually need to argue against storing a 3D model to people in robotics.

70s fashion

To get the full story we have to first go back to the late 70s. The study of visual object recognition is often motivated by the problem of recognizing 3D objects while only receiving 2D patterns of light on our retina. The question was whether our object representations is more similar to abstract three-dimensional descriptions, or are they tied more closely to the two-dimensional image of an object? A commonly held solution at that time, popularized by Marr was that the goal of vision is to reconstruct 3D. In the paper “Representation and recognition of the spatial organization of three-dimensional shapes” published in 1978 Marr and Nishihara assumes that at the end of the reconstruction process, viewer centered descriptions are mapped into object centered representations. This is based on the hypothesis that object representation should be invariant over changes in the retinal image. Based on this object centered theory, Biederman introduced the recognition by component (RBC) model in 1987 which proposes that objects are represented as a collection of volumes or parts. This quite influential model explains how object recognition can be viewpoint invariant and is often referred to as a structural description model.

The structural description model or object centered theory was the dominant theory of visual object understanding around that time and it can correctly predict the view-independent recognition of familiar objects. On the other hand, the viewer centered models, which store a set of 2D images instead of one single 3D model, are usually considered implausible because of the amount of memory a system would require to store all discriminable views of many objects.

1980-radio-shack-catalog

However, between late 1980’s to early 1990’s a wide variety of psychophysical and neurophysiological experiments surprisingly showed that human object recognition performance is strongly viewpoint dependent across rotation in depth. Before jumping into late 80’s I wanna first introduce some work done by Palmer, Rosch, and Chase in 1981. In their work they discovered that commonplace objects such as houses or cars can be hard or easy to recognize, depending on the attitude of the object with respect to the viewer. Subjects tended to respond quicker when the stimulus was shown from a good or canonical perspective. These observations was important in forming the viewer centered theory.

Paper clip like objects used in Bulthoff's experiments

Paper clip like objects used in Bulthoff’s experiments

In 1991 Bulthoff conducted an experiment on understanding these two theories. Subjects are shown sequences of animations where a paper clip like object is rotating. Given these sequences, the subjects have enough information to reconstruct a 3D model of the object. The subjects are then given a single image of a paper clip like object and are asked to identify whether it is the same object. Different viewing angles of the object are tested. The assumption is that if only one single complete 3D model of this object exists in our brain then recognizing it from all angles should be equally easy. However, according to Bulthoff when given every opportunity to form 3D, the subjects performed as if they have not done so.

Bulthoff 1991

In 1992 Edelman further showed that canonical perspectives arise even when all the views in question are shown equally often and the objects posses no intrinsic orientation that might lead to the advantage of some views.

Edelman 1992

Error rate from different viewpoint shown in Edelman’s experiment

In 1995 Tarr confirmed the discoveries using block like objects. Instead of showing a sequence of views of the object rotating, subjects are trained to learn how to build these block structures by manually placing them through an interface with fixed angle. The result shows that response times increased proportionally to the angular distance from the training viewpoint. With extensive practice, performance became nearly equivalent at all familiar viewpoints; however practice at familiar viewpoints did not transfer to unfamiliar viewpoints.

Tarr 1995

Based on these past observations, Logothetis, Pauls, and Poggio raised the question “If monkeys are extensively trained to identify novel 3D objects, would one find neurons in the brain that respond selectively to particular views of such object?” The results published in 1995 was clear. By conducting the same paper clip object recognition task on monkeys, they found 11.6% of the isolated neurons sampled in the IT region, which is the region that known to represent objects, responded selectively to a subset of views of one of the known target object. The response of these individual neurons decrease when the shown object rotate in all 4 axis from the canonical view which the neurons represent. The experiment also shows that these view specific neurons are scale and position invariant up to certain degree.

Logothetis 1995

Viewpoint specific neurons

These series of findings from human psychophysics and neurophysiolog research provided converging evidence for ‘image-based’ models in which objects are represented as collections of viewpoint-specific local features. A series of work in computer vision also shown that by allowing each canonical view to represent a range of images the model is no longer unfeasible. However despite a large amount of research, most of the detail mechanisms are still unknown and require further research.

Check out these papers visually in my other website EatPaper.org

References not linked in post:

Tarr, Michael J., and Heinrich H. Bülthoff. “Image-based object recognition in man, monkey and machine.” Cognition 67.1 (1998): 1-20.

Palmeri, Thomas J., and Isabel Gauthier. “Visual object understanding.” Nature Reviews Neuroscience 5.4 (2004): 291-303.

  1. Hi,
    Thank you for the post!

    Also, here is a paper that CV guy (Prof. Peter Meer, Co authur of mean-shift) complains about why CV scientists don’t give up 2D? :D
    http://coewww.rutgers.edu/riul/research/papers/pdf/pmopin.pdf

    I am a PhD student and some parts of my work is going to deal with 3D object representation. The approach I am pursuing is more and less similar to “1995 Tarr”.

    Looking forward to see next posts! ;)

    • That’s an interesting paper, although “storing objects in 3D” can have many interpretations and I personally believe 3D relationships are stored implicitly in the form of view specific information and inferred when needed.

      I am looking forward to your work.

  2. On the other hand Robots are not humans and humans are not robots.

  3. […] data a similar decision boundaries can be achieved. This is consistent to what I talked about in my last post; human brain doesn’t store all the possible views of an object nor does it store a 3D model […]

  4. […] each person or object one recognizes is associated with a single cell is biological implausible (see here for more discussion), the less extreme idea of grandmother cell is now explained as sparse […]

  5. […] you’re interested, feel free to check out the Serious Computer Vision Blog. You might find their article on how our brains model images and their important features very […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: