In Computer Vision, Neural Science, Paper Talk on February 9, 2014 at 6:42 pm
by Li Yang Ku (Gooly)
How our brain handles visual input is a myth. When Hubel and Wiesel discovered the Gabor filter like neuron in cat’s V1 area, several feed forward model theories appear. These models view our brain as a hierarchical classifier that extracts features layer by layer. Poggio’s papers “A feedforward architecture accounts for rapid categorization” and “Hierarchical models of object recognition in cortex” are good examples. These kind of structure are called discriminative models. Although this new type of model helped the community leap forward one step, it doesn’t solve the problem. Part of the reason is that there are ambiguities if you are only viewing part of the image locally and a feed-forward only structure can’t achieve global consistency.
Therefore the idea that some kind of feedback model has to exist gradually emerged. Some of the early works in the computer science community had first came up with models that rely on feedback, such as Gefforey Hinton’s Boltzman Machine invented back in the 80′s which developed into the so called deep learning around late 2000. However it was only around early 2000 had David Mumford clearly addressed the importance of feedback in the paper “Hierarchical Bayesian inference in the visual cortex“. Around the same time Wu and others had also combined feedback and feedforward models successfully on textures in the paper “Visual learning by integrating descriptive and generative methods“. Since then the computer vision community have partly embraced the idea that the brain is more like a generative model which in addition to categorizing inputs is capable of generating images. An example of human having generative skills will be drawing images out of imagination.
Slightly before David Mumford addresses the importance of the generative model. Lamme in the neuroscience community also started a series of research on the recurrent process in the vision system. His paper “The distinct modes of vision offered by feedforward and recurrent processing” published in 2000 addressed why recurrent (feedback) processing might be associated with conscious vision (recognizing object). While in the same year the paper “Competition for consciousness among visual events: the psychophysics of reentrant visual processes.” published in the field of psychology also addressed the reentrant (feedback) visual process and proposed a model where conscious vision is associated with the reentrant visual process.
While both the neuroscience and psychology field have research results that suggests a brain model that is composed of feedforward and feedback processing where the feedback mechanism is associated with conscious vision, a recent paper “Detecting meaning in RSVP at 13 ms per picture” shows that human is able to recognize high level concept of an image within 13 ms, a very short gap that won’t allow the brain to do a complete reentrant (feedback) visual process. This conflicting result could suggest that conscious vision is not the result of feedback processing or there are still missing pieces that we haven’t discover. This kind of reminds me one of Jeff Hawkins’ brain theory, which he said that solving the mystery of consciousness is like figuring out the world is round not flat, it’s easy to understand but hard to accept, and he believes that consciousness does not reside in one part of the brain but is simply the combination of all firing neuron from top to bottom.
In Computer Vision, Machine Learning on November 27, 2013 at 5:18 pm
by Li Yang Ku (Gooly)
I had the chance to chat with Abhinav Gupta, a research professor at CMU, in person when he visited UMass Amherst about a month ago. Abhinav presented NEIL, the never ending image learner in his talk at Amherst. To give a short intro, the following is from Abhinav
“NEIL (Never Ending Image Learner) is a computer program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from Internet data. It is an effort to build the world’s largest visual knowledge base with minimum human labeling effort – one that would be useful to many computer vision and AI efforts.”
One of the characteristic that distinguishes NEIL from other object recognition algorithms that are trained and tested on large web image data set such as the ImageNet or LFW is that NEIL is trying to recognize images that are in a set that has unlimited data and unlimited category. At first glance this might look like a problem too hard to solve. But NEIL approaches this problem in a smart way. Instead of trying to label images one by one on the internet, NEIL start from labeling just the easy ones. Since given a keyword the number of images returned are so large using Google Image Search, NEIL simply picks the ones it feels most certain, which are the ones that share the most common HOG like features. This step also helps refining the query result. Say we searched for cars on Google Image, it is very likely that out of every 100 images there is one image that has nothing to do with cars (very likely some sexy photo of girls with file name girl_love_cars.jpg ). These outliers won’t share the same visual features as the other car clusters and will not be labeled. By doing so NEIL can gradually build up a very large labeled data set from one word to another.
NEIL also learns the relationships between images and is connected with NELL, never ending language learning. More details should be released in future papers. During the talk Abhinav said he plan to set up a system where you can submit the category you wanna train on and with just $1, NEIL will give you a set of HOG classifiers in that category in 1 day.
In Book It, Computer Vision, Kinect, Point Cloud Library on November 20, 2013 at 7:53 pm
by Li Yang Ku (Gooly)
I was asked to help review a technical book “OpenNI Cookbook” about the OpenNI library for Kinect like sensors recently. This is the kind of book that would be helpful if you just started developing OpenNI applications in Windows. Although I did all my OpenNI researches in Linux, it was mostly because I need it to work with robots that use ROS (Robotic Operating System), which was only supported in Ubuntu. OpenNI was always more stable and more supported in Windows than in Linux. However, if you plan to use PCL (Point Cloud Library) with OpenNI, you might still want to consider Linux.
The book contains topics from basic to advance applications such as getting the raw sensor data, hand tracking and skeleton tracking. It also contains sections that people don’t usually talk about but crucial for actual software development such as listening to connect and disconnect events. The code in this book uses the OpenNI2 library, which is the latest version of OpenNI. Note that although OpenNI is opensource, the NITE library for hand tracking and human tracking used in the book isn’t. (But free under certain license)
You can also buy the book on Amazon.