Life is a game, take it seriously

Posts Tagged ‘3D models’

How objects are represented in human brain? Structural description models versus Image-based models

In Computer Vision, Neural Science, Paper Talk on October 30, 2014 at 9:06 pm

by Li Yang Ku (Gooly)


A few years ago while I was still back in UCLA, Tomaso Poggio came to give a talk about the object recognition work he did with 2D templates. After the talk some student asked about whether he thought about using a 3D model to help recognizing objects from different viewpoints. “The field seems to agree that models are stored as 2D images instead of 3D models in human brain” was the short answer Tomaso replied. Since then I took it as a fact and never had a second thought of it till a few month ago when I actually need to argue against storing a 3D model to people in robotics.

70s fashion

To get the full story we have to first go back to the late 70s. The study of visual object recognition is often motivated by the problem of recognizing 3D objects while only receiving 2D patterns of light on our retina. The question was whether our object representations is more similar to abstract three-dimensional descriptions, or are they tied more closely to the two-dimensional image of an object? A commonly held solution at that time, popularized by Marr was that the goal of vision is to reconstruct 3D. In the paper “Representation and recognition of the spatial organization of three-dimensional shapes” published in 1978 Marr and Nishihara assumes that at the end of the reconstruction process, viewer centered descriptions are mapped into object centered representations. This is based on the hypothesis that object representation should be invariant over changes in the retinal image. Based on this object centered theory, Biederman introduced the recognition by component (RBC) model in 1987 which proposes that objects are represented as a collection of volumes or parts. This quite influential model explains how object recognition can be viewpoint invariant and is often referred to as a structural description model.

The structural description model or object centered theory was the dominant theory of visual object understanding around that time and it can correctly predict the view-independent recognition of familiar objects. On the other hand, the viewer centered models, which store a set of 2D images instead of one single 3D model, are usually considered implausible because of the amount of memory a system would require to store all discriminable views of many objects.


However, between late 1980’s to early 1990’s a wide variety of psychophysical and neurophysiological experiments surprisingly showed that human object recognition performance is strongly viewpoint dependent across rotation in depth. Before jumping into late 80’s I wanna first introduce some work done by Palmer, Rosch, and Chase in 1981. In their work they discovered that commonplace objects such as houses or cars can be hard or easy to recognize, depending on the attitude of the object with respect to the viewer. Subjects tended to respond quicker when the stimulus was shown from a good or canonical perspective. These observations was important in forming the viewer centered theory.

Paper clip like objects used in Bulthoff's experiments

Paper clip like objects used in Bulthoff’s experiments

In 1991 Bulthoff conducted an experiment on understanding these two theories. Subjects are shown sequences of animations where a paper clip like object is rotating. Given these sequences, the subjects have enough information to reconstruct a 3D model of the object. The subjects are then given a single image of a paper clip like object and are asked to identify whether it is the same object. Different viewing angles of the object are tested. The assumption is that if only one single complete 3D model of this object exists in our brain then recognizing it from all angles should be equally easy. However, according to Bulthoff when given every opportunity to form 3D, the subjects performed as if they have not done so.

Bulthoff 1991

In 1992 Edelman further showed that canonical perspectives arise even when all the views in question are shown equally often and the objects posses no intrinsic orientation that might lead to the advantage of some views.

Edelman 1992

Error rate from different viewpoint shown in Edelman’s experiment

In 1995 Tarr confirmed the discoveries using block like objects. Instead of showing a sequence of views of the object rotating, subjects are trained to learn how to build these block structures by manually placing them through an interface with fixed angle. The result shows that response times increased proportionally to the angular distance from the training viewpoint. With extensive practice, performance became nearly equivalent at all familiar viewpoints; however practice at familiar viewpoints did not transfer to unfamiliar viewpoints.

Tarr 1995

Based on these past observations, Logothetis, Pauls, and Poggio raised the question “If monkeys are extensively trained to identify novel 3D objects, would one find neurons in the brain that respond selectively to particular views of such object?” The results published in 1995 was clear. By conducting the same paper clip object recognition task on monkeys, they found 11.6% of the isolated neurons sampled in the IT region, which is the region that known to represent objects, responded selectively to a subset of views of one of the known target object. The response of these individual neurons decrease when the shown object rotate in all 4 axis from the canonical view which the neurons represent. The experiment also shows that these view specific neurons are scale and position invariant up to certain degree.

Logothetis 1995

Viewpoint specific neurons

These series of findings from human psychophysics and neurophysiolog research provided converging evidence for ‘image-based’ models in which objects are represented as collections of viewpoint-specific local features. A series of work in computer vision also shown that by allowing each canonical view to represent a range of images the model is no longer unfeasible. However despite a large amount of research, most of the detail mechanisms are still unknown and require further research.

Check out these papers visually in my other website

References not linked in post:

Tarr, Michael J., and Heinrich H. Bülthoff. “Image-based object recognition in man, monkey and machine.” Cognition 67.1 (1998): 1-20.

Palmeri, Thomas J., and Isabel Gauthier. “Visual object understanding.” Nature Reviews Neuroscience 5.4 (2004): 291-303.

Creating 3D mesh models using Asus xtion with RGBDemo and Meshlab on Ubuntu 12.04

In Computer Vision, Kinect on March 12, 2014 at 5:15 pm

by Li Yang Ku (Gooly)


Creating 3D models simply by scanning an object using low cost sensors is something that sounds futuristic but isn’t. Although models scanned with a Kinect or Asus xtion aren’t as pretty as CAD models nor laser scanned models, they might actually be helpful in robotics research. A not so perfect model scanned by the same sensor on the robot is closer to what the robot perceives. In this post I’ll go through the steps on creating a polygon mesh model from scanning a coke can using the xtion sensor. The steps are consist of 3 parts: compiling RGBDemo, scanning the object, and converting scanned vertices to a polygon mesh in Meshlab.


RGBDemo is a great piece of opensource software that can help you scan objects into a single ply file with the help of some AR-tags. If you are using a Windows machine, running the compiled binary should be the easiest way to get started. However if you are running on an Ubuntu machine, the following are the steps I did. (I had compile errors following the official instruction, but still might worth a try)

  1. Make sure you have OpenNI installed. I use the old version OpenNI instead of OpenNI2. See my previous post about installing OpenNI on Ubuntu if you haven’t.
  2. Make sure you have PCL and OpenCV installed. For PCL I use the one that comes with ROS (ros-fuerte-pcl) and for OpenCV I have libcv2.3 installed.
  3. Download RGBDemo from Github
    git clone --recursive
  4. Modify the file under the rgbdemo folder. Add the following line among the other options so that it won’t use OpenNI2.
        -DNESTK_USE_OPENNI2=0 \
  5. Modify rgbdemo/scan-markers/ModelAcquisitionWindow.cpp. Comment out line 57 to 61. (For compile error: ‘const class ntk::RGBDImage’ has no member named ‘withDepthDataAndCalibrated’)
        void ModelAcquisitionWindow::on_saveMeshButton_clicked()
            //if (!m_controller.modelAcquisitionController()->currentImage().withDepthDataAndCalibrated())
                //ntk_dbg(1) << "No image already processed.";
            QString filename = QFileDialog::getSaveFileName
  6. cmake and build
  7. The binary files should be built under build/bin/.

turtle mesh

To create a 3D mesh model, we first capture a model (PLY file) that only consists of vertices using RGBDemo.

  1. Print out the AR tags located in the folder ./scan-markers/data/, stick them on a flat board such that the numbers are close to each other. Put your target object on the center of the board.
  2. Run the binary ./build/bin/rgbd-scan-markers
  3. Two windows should pop out, RGB-D Capture and 3D View. Point the camera toward the object on the board and click “Add current frame” in the 3D view window. Move the camera around the object to fill the missing pieces of the model.
  4. Click on the RGB-D Capture window and click Capture->pause in the menu top of the screen. Click “Remove floor plane” in the 3D View Window to remove most of the board.
  5. Click “Save current mesh” to save the vertices into a ply file.


The following steps convert the model captured from RGBDemo to a 3D mesh model in MeshLab (MeshLab can be installed through Ubuntu Software Center).

  1. Import the ply file created in the last section.
  2. Remove unwanted vertices in the model. (select and delete, let me know if you can’t figure out how to do this)
  3. Click on “Filters ->Point Set -> Surface Reconstruction: Poisson”. This will pop up a dialog, apply the default setting will generate a mesh that has an estimated surface. If you check “View -> show layer dialog” you should be able to see two layers, the original and the new constructed mesh.
  4. To transfer color to the new mesh click “Filters -> Sampling -> Vertex Attribute Transfer”. Select mesh.ply as source and poisson mesh as target. This should transfer the colors on the vertices to the mesh.
  5. Note that MeshLab has some problem when saving to the collada(dae) format.