Life is a game, take it seriously

Archive for May, 2013|Monthly archive page

How things work: CamFind

In Computer Vision on May 11, 2013 at 3:45 pm

by Gooly (Li Yang Ku)

camfind algorithm

If you remember, I talked about how the image search app Amazon Flow might work last year, and promised I would talk about Google Goggle later, which I didn’t (and I blame my unpredictable life and laziness).  To compensate that I am gonna talk about a similar app, but before downloading the app please read the whole post for your own sake.

The app I am gonna discuss today is CamFind. Someone recommended the app on Facebook so I decided to give it a try. And then I went through the whole sequence of emotion from 1. Curious, 2. Amazed of the accuracy, 3. Skeptical, 4. Super Amazed, 5. Embarrassed about how stupid I am compared to the vision team behind the app, 6. Did some more test, skeptical++, 7. Search and relieved.

So now the story. I downloaded the app and took a picture of my Adventure Time Spanish poster (hell yeah I Love Adventure Time), it took a few seconds but returned an accurate result. It is good but Amazon Flow can also do that. Then I started to test the limit, magazine check, rice vinegar bottle check, mug check, weird looking vacuum check. By this time I am pretty shocked.

adventure time

At first I suspect some one must be looking at the picture and replying the result. To tell the truth this is a common practice among start-ups. You don’t have the technology or business yet but want to test if a concept works, so you fake the technology and business, if it works you get money from VCs, if it doesn’t they also have a strategy for that “fail fast”. One of the famous examples is the first car sold on the internet. The guy that started the website literally went to buy a car and delivered it himself on the first order. Ton’s of business started this way.

However the response speed is far faster than I thought it would be if someone is behind and I was pretty sure nobody would use this strategy on this kind of business. So I did a few more tests and it got every thing reasonable well, except for recognizing my plastic chair as a plastic table and my chipmunk doll as a bear doll. It even got my Dilbert sticker right, yeah it says it’s a “Dilbert sticker”. I was so shocked that I felt embarrassed that I don’t know such technology exists. After reading so many papers in the past few years, I can’t believe I missed out some of the most amazing ones.

So I desperately searched on the web, at the same time dreaming of buying the company just to know how they did it. And I finally found the missing piece. Seth Geib posted the following comment in one of the reviews.

I have entered a few images that there is pretty much no way any algorithm could detect and it returned a spot on result. Also I get different, very specific results if I submit an image multiple times. I am positive they have a team of people screening these images, which is a definite privacy concern.

UPDATE: Upon asking this question to the CamFind team on their FB page they responded with the following:

“Hi, to answer your question, CamFind uses a combination of computer vision and human crowdsourcing to identify the object photographed.”

So I guess just be aware that any images you take will be screened by individuals. They should really note this in their app first and foremost upon opening it.

What a relief.

regret

I am not sure what their plan is, but I am not sure if this is a concept needed to be tested on with this strategy. It’s like asking if you want an artificial secretary that would understand all your needs and give you information in a few seconds with no cost. Faking a robot with human to test if human needs a robot is kind of a weird concept. Even if you proved that this is a good business model, how do you build the vision technology? You can’t just throw money at people and get vision algorithms that work like human. Some business guys just don’t get it.

It’s possible that with more people using this app, they would obtain a large labeled image database, and can train their algorithms based on that and improve the automatic part to be the best image search on the market. But I can tell you, nope, there is no way their algorithms gonna be half as good as having a human behind. We humans didn’t learn to recognize by scanning through billions of photos, and the fastest path to build a human like vision system would be to have them learn as how we learned.

And by the way, I am glad I didn’t take any pictures that I shouldn’t be taking with the app. I hope you didn’t either.

Back to Basics: Sparse Coding?

In Computer Vision, Neural Science, Paper Talk on May 4, 2013 at 9:04 pm

by Gooly (Li Yang Ku)

Gabor like filters

It’s always good to go back to the reason that lured you into computer vision once in a while. Mine was to understand the brain after I astonishingly realized that computers have no intelligence while I was studying EE in undergrad. In fact if they use the translation “computer” instead of  “electrical brain” in my mother language, I would probably be better off.

Anyway, I am currently revisiting some of the first few computer vision papers I read, and to tell the truth I still learn a lot from reading stuffs I read several times before, which you can also interpret it as I never actually understood a paper.

So back to the papers,

Simoncelli, Eero P., and Bruno A. Olshausen. “Natural image statistics and neural representation.” Annual review of neuroscience 24.1 (2001): 1193-1216.

Olshausen, Bruno A., and David J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by VI?.” Vision research 37.23 (1997): 3311-3326.

Olshausen, Bruno A. “Emergence of simple-cell receptive field properties by learning a sparse code for natural images.” Nature 381.6583 (1996): 607-609.

These 3 papers are essentially the same, the first two are the spin-offs of the 3rd paper published in Nature. I personally prefer the second paper for reading.

Brain Sparse Coding

In this paper, Bruno explains why overcomplete sparse coding is essential for human vision in a statistical way. The goal is to obtain a set of basis functions that can be used to regenerate an image. (basis functions are filters) This can be viewed as an image encoding problem, but instead of having an encoder that compresses the image to the minimum size, the goal is to also remain sparsity, which means only a small amount of basis are used compared to the whole basis pool. Sparsity has obvious advantage biologically, such as saving energy, but Bruno conjectured that sparsity is also essential to vision and is originated from the sparse structure in natural image.

In order to obtain this set of sparse basis, a sparsity constraint is added to the energy function for optimization. The final result is a set of basis function (image atop) that interestingly looks very similar to Gabor filters which is found in the visual cortex. This some how proves that sparseness is essential in the evolution of human vision.