In Computer Vision, Neural Science, Paper Talk on September 29, 2013 at 7:31 pm
by Gooly (Li Yang Ku)
In one of my previous post I talked about the big picture of object recognition, which can be divided into two parts 1) transforming the image space 2) classifying and grouping. In this post I am gonna talk about a paper that clarifies object recognition and some of it’s pretty cool graphs explaining how our brains might transform the input image space. The paper also talked about why the idealized classification might not be what we want.
Lets start by explaining what’s a manifold.
An object manifold is the set of images projected by one object in the image space. Since each image is a point in the image space and an object can project similar images with infinitely small differences, the points form a continuous surface in the image space. This continuous surface is the object’s manifold. Figure (a) above is an idealized manifold generated by a specific face. When the face is viewed from different angles the projected point move around on the continuous manifold. Although the graph is drew in 3D one should keep in mind that it is actually in a much larger dimension space. A 576000000 dimension space if consider human eyes to be 576 mega pixel. Figure (b) shows another manifold from another face, in this space the two individuals can be separated easily by a plane. Figure (c) shows another space which the two faces would be hard to separate. Note that these are ideal spaces that is possibly transformed from the original image space by our cortex. If the shapes are that simple, object recognition would be easy. However, the actual stuff we get is in Figure (d). The object manifolds from two objects are usually tangled and intersect in multiple spots. However the two image space are not the same, therefore it is possible that through some non linear operation we can transform figure (d) to something more like figure (c).
One interesting point this paper made is that the traditional view that there is a neuron that represents an object is probably wrong. Instead of having a grandmother cell (yes.. that’s how they called it) that represents your grandma, our brain might actually represents her with a manifold. Neurologically speaking, a manifold could be a set of neurons that have a certain firing pattern. This is related to the sparse encoding I talked about before and is consistent with Jeff Hawkins’ brain theory. (See his talk about sparse distribution representation around 17:15)
The figure (b) and (c) above are the comparison between a manifold representation and a single cell representation. What is being emphasized is that object recognition is more a task of transforming the space rather than classification.
In Computer Vision, Paper Talk, Visual Illusion on September 16, 2013 at 6:33 pm
by Gooly (Li Yang Ku)
I talked about some visual illusions in my previous post but didn’t mention why they are important to computer vision and the pros of seeing visual illusions. In this post I am gonna talk about the advantage of having two of the most common known visual illusions, Illusory contours and checkerboard illusion.
The Kanizsa’s triangle invented by Gaetano Kanizsa is a very good example of illusory contours. Even though the center upside down triangle doesn’t exist, you are forced to see it because of the clues given by the other parts. If you gradually cover up some of the circles and corners, at some point you would be able to see the pac man and the angle as individual objects and the illusory contours will disappear. This illusion is the side effect of how we perceive objects and shows that we see edges using various clues instead of just light differences. Because our eyes receive noisy real world inputs, illusory contour actually helps us fill in the missing contours caused by lighting, shading, or occlusion. It also explains why a bottom up vision system won’t work in many situations. In the paper “Hierarchical Bayesian inference in the visual cortex” written by Lee and Mumford, a Kanizsa’s square is used to test whether monkeys perceive illusory contours in V1. The result is positive but has a delayed response compared to V2. This suggests that information of illusory contours is possibly generated in V2 and back propagated to V1.
This checkerboard illusion above is done by Edward H. Adelson. In the book “Perception as Bayesian Inference” Adelson wrote a chapter discussing how we perceive objects under different lighting conditions. In other words, how we achieve “lightness constancy”. The illusion above should be easily understandable. At first sight, In the left image square A on the checkerboard seems to be darker than square B although they actually have the same brightness. By breaking the 3D structure, the right images shows that the two squares indeed have the same brightness. We perceive A and B differently in the left image because our vision system is trying to achieve lightness invariant. In fact if the cylinder is removed square A will be darker than square B, therefore lightness constancy actually gives us the correct brightness when only constant lighting is presented. This allows us to recognize the same object even under large lighting changes, which I would argue is an important ability for survival. In the paper “Recovering reflectance and illumination in a world of painted polyhedra” by Sinha and Adelson, how we construct 3D structure from 2D drawing and shading are further discussed. Understanding object’s 3D structure is crucial in obtaining light constancy like the checkerboard illusion above. As in the image below, by removing certain types of junction clues, a 3D drawing can easily be seen flat. However, as mentioned in the paper, more complex global strategies are needed to cover all cases.
I was gonna post this a few month ago but was delayed by my Los Angeles to Boston road trip (and numerous good bye parties), but I am now officially back to school in UMASS Amherst for a PhD program. Not totally settled down yet but enough to make a quick post.