Life is a game, take it seriously

Posts Tagged ‘vision’

Human vision, top down or bottom up?

In Computer Vision, Neural Science, Paper Talk on February 9, 2014 at 6:42 pm

by Li Yang Ku (Gooly)

top-down bottom-up

How our brain handles visual input is a myth. When Hubel and Wiesel discovered the Gabor filter like neuron in cat’s V1 area, several feed forward model theories appear. These models view our brain as a hierarchical classifier that extracts features layer by layer. Poggio’s papers “A feedforward architecture accounts for rapid categorization” and “Hierarchical models of object recognition in cortex” are good examples. These kind of structure are called discriminative models. Although this new type of model helped the community leap forward one step, it doesn’t solve the problem. Part of the reason is that there are ambiguities if you are only viewing part of the image locally and a feed-forward only structure can’t achieve global consistency.

Feedforward Vision

Therefore the idea that some kind of feedback model has to exist gradually emerged. Some of the early works in the computer science community had first came up with models that rely on feedback, such as Gefforey Hinton’s Boltzman Machine invented back in the 80’s which developed into the so called deep learning around late 2000. However it was only around early 2000 had David Mumford clearly addressed the importance of feedback in the paper “Hierarchical Bayesian inference in the visual cortex“.  Around the same time Wu and others had also combined feedback and feedforward models successfully on textures in the paper “Visual learning by integrating descriptive and generative methods“. Since then the computer vision community have partly embraced the idea that the brain is more like a generative model which in addition to categorizing inputs is capable of generating images. An example of human having generative skills will be drawing images out of imagination.


Slightly before David Mumford addresses the importance of the generative model. Lamme in the neuroscience community also started a series of research on the recurrent process in the vision system. His paper “The distinct modes of vision offered by feedforward and recurrent processing” published in 2000 addressed why recurrent (feedback) processing might be associated with conscious vision (recognizing object). While in the same year the paper “Competition for consciousness among visual events: the psychophysics of reentrant visual processes.” published in the field of psychology also addressed the reentrant (feedback) visual process and proposed a model where conscious vision is associated with the reentrant visual process.


While both the neuroscience and psychology field have research results that suggests a brain model that is composed of feedforward and feedback processing where the feedback mechanism is associated with conscious vision, a recent paper “Detecting meaning in RSVP at 13 ms per picture” shows that human is able to recognize high level concept of an image within 13 ms, a very short gap that won’t allow the brain to do a complete reentrant (feedback) visual process. This conflicting result could suggest that conscious vision is not the result of feedback processing or there are still missing pieces that we haven’t discover. This kind of reminds me one of Jeff Hawkins’  brain theory, which he said that solving the mystery of consciousness is like figuring out the world is round not flat, it’s easy to understand but hard to accept, and he believes that consciousness does not reside in one part of the brain but is simply the combination of all firing neuron from top to bottom.

Visual Illusion: Chronostasis and Saccadic Masking

In Computer Vision, Neural Science, Visual Illusion on June 26, 2013 at 9:54 pm

by Gooly (Li Yang Ku)

some visual art to attract your attention, has little to do with the post

I was always intrigued by visual illusions and am often surprised by how often we are fooled by our eyes. Some visual illusion is just as good as a good joke. One of my favorite illusion is the spinning dancer, which I can’t easily change the direction I interpret despite knowing it could be both. Understanding visual illusions is also crucial in Computer Vision because they are just side effects produced by the underlying algorithm that helps us see. A great vision algorithm should probably have the same visual illusion as humans do.

Spinning Dancer illusion

Chronostasis is a kind of visual illusion that occurs to you every moment without you noticing. To test it out, you need to find a clock that has a seconds hand; first focus your gaze on some where close so that you can still see the hand ticking from the side view but not too close, then shift your gaze to the seconds hand when it just moved. You’ll notice that the first tick seems to be longer than the other ticks after it.


This illusion is caused by Saccadic Masking, a mechanism that our brain uses to help us see the world without getting dizzy. Our eyes are constantly moving and our head also turns a lot. Saccadic masking shuts down the input when the scenes that shown to your eyes are blurry. So when you move your eyes, the brain has two choices, it can either keep the last image or show you the next stable image in the future. So now you might be yelling “HOW COULD THE BRAIN POSSIBLY SHOW YOU THE IMAGE IT HAVEN’T SEEN!” Yeah, that’s not possible. But remember that there is no clock ticking in your brain and time is just what you feel; so your brain can just freeze your internal clock and wait for the next image then fast forward your internal clock so it syncs back with the real world. And that’s what happened to you when you did that first gaze shift to the seconds hand.


To test out Saccadic Masking you can also find a mirror and stare at your pretty (or nerdy) eyes. First focus on your left eyes, then shift your gaze to your right eye. You won’t be able to see your own eyes saccade because of Saccadic Masking, but if you record yourself doing the same experiment with a smartphone’s forward facing camera, you would be able to see your eyes saccade clearly. (note that smart phone cameras have time delays, so don’t use them as a mirror for testing. It is highly recommended to be used as a mirror outside of the experiment though; it always shows a slightly younger you.)

RVIZ: a good reason to implement a vision system in ROS

In Computer Vision, Point Cloud Library, Robotics on November 18, 2012 at 2:33 pm

by Gooly (Li Yang Ku)

It might seem illogical to implement a vision system in ROS (Robot Operating System) if you are working on pure vision, however after messing with ROS and PCL for a year I can see the advantages of doing this. To clarify, we started to use ROS only because we need it to communicate with Robonaut 2, but the package RVIZ in ROS are truly very helpful such that I would recommend it even if no robots are involved.

(Keynote speech about Robonaut 2 and ROS from the brilliant guy I work for)


RVIZ is a ROS package that visualizes robots, point clouds, etc. Although PCL does provide a visualizer for point cloud, it only provides the most basic visualize function. It is really not comparable with what RVIZ can give you.

  1. RVIZ is perfect for figuring out what went wrong in a vision system. The list on the left has a check box for each item. You can show or hide any visual information instantly.
  2. RVIZ provides 3D visualization which you could navigate with just your mouse. At first I prefer the kind of navigation similar to Microsoft Robotic Studio or Counter Strike. But once you get used to it, it is pretty handy. Since I already have 2 keyboards and 2 mouses, it’s quiet convenient to move around with my left mouse while not leaving my right hand from my right mouse.
  3. The best part of RVIZ is the interactive marker. This is the part where you can be really creative. It makes selecting a certain area in 3D relative easy. You can therefore adjust your vision system manually while it is still running such as select a certain area as your work space and ignoring other region.
  4. You can have multiple vision processes showing vision data in the same RVIZ. You simply have to publish the point cloud or shape you want to show using the ROS publishing method. Visualizing is relatively painless once you get used to it.

Try not to view ROS as an operating system like Windows, Linux. It is more like internet, where RVIZ is just one service like google map, and you can write your own app that queries the map if you use the same communication protocol provided by ROS.