by Gooly (Li Yang Ku)
If you remember, I talked about how the image search app Amazon Flow might work last year, and promised I would talk about Google Goggle later, which I didn’t (and I blame my unpredictable life and laziness). To compensate that I am gonna talk about a similar app, but before downloading the app please read the whole post for your own sake.
The app I am gonna discuss today is CamFind. Someone recommended the app on Facebook so I decided to give it a try. And then I went through the whole sequence of emotion from 1. Curious, 2. Amazed of the accuracy, 3. Skeptical, 4. Super Amazed, 5. Embarrassed about how stupid I am compared to the vision team behind the app, 6. Did some more test, skeptical++, 7. Search and relieved.
So now the story. I downloaded the app and took a picture of my Adventure Time Spanish poster (hell yeah I Love Adventure Time), it took a few seconds but returned an accurate result. It is good but Amazon Flow can also do that. Then I started to test the limit, magazine check, rice vinegar bottle check, mug check, weird looking vacuum check. By this time I am pretty shocked.
At first I suspect some one must be looking at the picture and replying the result. To tell the truth this is a common practice among start-ups. You don’t have the technology or business yet but want to test if a concept works, so you fake the technology and business, if it works you get money from VCs, if it doesn’t they also have a strategy for that “fail fast”. One of the famous examples is the first car sold on the internet. The guy that started the website literally went to buy a car and delivered it himself on the first order. Ton’s of business started this way.
However the response speed is far faster than I thought it would be if someone is behind and I was pretty sure nobody would use this strategy on this kind of business. So I did a few more tests and it got every thing reasonable well, except for recognizing my plastic chair as a plastic table and my chipmunk doll as a bear doll. It even got my Dilbert sticker right, yeah it says it’s a “Dilbert sticker”. I was so shocked that I felt embarrassed that I don’t know such technology exists. After reading so many papers in the past few years, I can’t believe I missed out some of the most amazing ones.
So I desperately searched on the web, at the same time dreaming of buying the company just to know how they did it. And I finally found the missing piece. Seth Geib posted the following comment in one of the reviews.
I have entered a few images that there is pretty much no way any algorithm could detect and it returned a spot on result. Also I get different, very specific results if I submit an image multiple times. I am positive they have a team of people screening these images, which is a definite privacy concern.
UPDATE: Upon asking this question to the CamFind team on their FB page they responded with the following:
“Hi, to answer your question, CamFind uses a combination of computer vision and human crowdsourcing to identify the object photographed.”
So I guess just be aware that any images you take will be screened by individuals. They should really note this in their app first and foremost upon opening it.
What a relief.
I am not sure what their plan is, but I am not sure if this is a concept needed to be tested on with this strategy. It’s like asking if you want an artificial secretary that would understand all your needs and give you information in a few seconds with no cost. Faking a robot with human to test if human needs a robot is kind of a weird concept. Even if you proved that this is a good business model, how do you build the vision technology? You can’t just throw money at people and get vision algorithms that work like human. Some business guys just don’t get it.
It’s possible that with more people using this app, they would obtain a large labeled image database, and can train their algorithms based on that and improve the automatic part to be the best image search on the market. But I can tell you, nope, there is no way their algorithms gonna be half as good as having a human behind. We humans didn’t learn to recognize by scanning through billions of photos, and the fastest path to build a human like vision system would be to have them learn as how we learned.
And by the way, I am glad I didn’t take any pictures that I shouldn’t be taking with the app. I hope you didn’t either.