In Computer Vision, Machine Learning on November 27, 2013 at 5:18 pm
by Li Yang Ku (Gooly)
I had the chance to chat with Abhinav Gupta, a research professor at CMU, in person when he visited UMass Amherst about a month ago. Abhinav presented NEIL, the never ending image learner in his talk at Amherst. To give a short intro, the following is from Abhinav
“NEIL (Never Ending Image Learner) is a computer program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from Internet data. It is an effort to build the world’s largest visual knowledge base with minimum human labeling effort – one that would be useful to many computer vision and AI efforts.”
One of the characteristic that distinguishes NEIL from other object recognition algorithms that are trained and tested on large web image data set such as the ImageNet or LFW is that NEIL is trying to recognize images that are in a set that has unlimited data and unlimited category. At first glance this might look like a problem too hard to solve. But NEIL approaches this problem in a smart way. Instead of trying to label images one by one on the internet, NEIL start from labeling just the easy ones. Since given a keyword the number of images returned are so large using Google Image Search, NEIL simply picks the ones it feels most certain, which are the ones that share the most common HOG like features. This step also helps refining the query result. Say we searched for cars on Google Image, it is very likely that out of every 100 images there is one image that has nothing to do with cars (very likely some sexy photo of girls with file name girl_love_cars.jpg ). These outliers won’t share the same visual features as the other car clusters and will not be labeled. By doing so NEIL can gradually build up a very large labeled data set from one word to another.
NEIL also learns the relationships between images and is connected with NELL, never ending language learning. More details should be released in future papers. During the talk Abhinav said he plan to set up a system where you can submit the category you wanna train on and with just $1, NEIL will give you a set of HOG classifiers in that category in 1 day.
In Book It, Computer Vision, Kinect, Point Cloud Library on November 20, 2013 at 7:53 pm
by Li Yang Ku (Gooly)
I was asked to help review a technical book “OpenNI Cookbook” about the OpenNI library for Kinect like sensors recently. This is the kind of book that would be helpful if you just started developing OpenNI applications in Windows. Although I did all my OpenNI researches in Linux, it was mostly because I need it to work with robots that use ROS (Robotic Operating System), which was only supported in Ubuntu. OpenNI was always more stable and more supported in Windows than in Linux. However, if you plan to use PCL (Point Cloud Library) with OpenNI, you might still want to consider Linux.
The book contains topics from basic to advance applications such as getting the raw sensor data, hand tracking and skeleton tracking. It also contains sections that people don’t usually talk about but crucial for actual software development such as listening to connect and disconnect events. The code in this book uses the OpenNI2 library, which is the latest version of OpenNI. Note that although OpenNI is opensource, the NITE library for hand tracking and human tracking used in the book isn’t. (But free under certain license)
You can also buy the book on Amazon.
In Computer Vision, Neural Science, Paper Talk on September 29, 2013 at 7:31 pm
by Gooly (Li Yang Ku)
In one of my previous post I talked about the big picture of object recognition, which can be divided into two parts 1) transforming the image space 2) classifying and grouping. In this post I am gonna talk about a paper that clarifies object recognition and some of it’s pretty cool graphs explaining how our brains might transform the input image space. The paper also talked about why the idealized classification might not be what we want.
Lets start by explaining what’s a manifold.
An object manifold is the set of images projected by one object in the image space. Since each image is a point in the image space and an object can project similar images with infinitely small differences, the points form a continuous surface in the image space. This continuous surface is the object’s manifold. Figure (a) above is an idealized manifold generated by a specific face. When the face is viewed from different angles the projected point move around on the continuous manifold. Although the graph is drew in 3D one should keep in mind that it is actually in a much larger dimension space. A 576000000 dimension space if consider human eyes to be 576 mega pixel. Figure (b) shows another manifold from another face, in this space the two individuals can be separated easily by a plane. Figure (c) shows another space which the two faces would be hard to separate. Note that these are ideal spaces that is possibly transformed from the original image space by our cortex. If the shapes are that simple, object recognition would be easy. However, the actual stuff we get is in Figure (d). The object manifolds from two objects are usually tangled and intersect in multiple spots. However the two image space are not the same, therefore it is possible that through some non linear operation we can transform figure (d) to something more like figure (c).
One interesting point this paper made is that the traditional view that there is a neuron that represents an object is probably wrong. Instead of having a grandmother cell (yes.. that’s how they called it) that represents your grandma, our brain might actually represents her with a manifold. Neurologically speaking, a manifold could be a set of neurons that have a certain firing pattern. This is related to the sparse encoding I talked about before and is consistent with Jeff Hawkins’ brain theory. (See his talk about sparse distribution representation around 17:15)
The figure (b) and (c) above are the comparison between a manifold representation and a single cell representation. What is being emphasized is that object recognition is more a task of transforming the space rather than classification.