by Gooly (Li Yang Ku)
Amazon and Google are now the top players in the area of image query. Amazon’s laboratory A9 acquired the image query company SnapTell in 2009 and released a smart phone app Flow in mid 2011. Google have long been in the image search service since it acquired Neven Vision in 2006. Google also released Google Goggle in early 2011.
Amazon’s Flow is an app that allows users to obtain product information by pointing your phone camera to the product. The idea is to allow consumers to buy stuffs on Amazon in a rival’s physical store and also to report the local price of the product to Amazon. This controversial business idea is considered by many shop owners as immoral but might be unavoidable.
Google Goggle is a more general smart phone app that does image recognition and image query. I’ll leave Google Goggle to the next post and focus on Flow now.
The Flow App is made by A9’s visual search group, which I believe is basically the SnapTell group. Before Amazon acquiring SnapTell, it already had a visual search App. It even used a similar logo (see below).
On A9’s official website only very few information is revealed about the technology; we can only guess from the following flowchart. Apparently they are waiting for patents to get approved and doesn’t want to reveal any detail.
To get some sense about what this is all about we have to dig deeper and see who are the people that made this app. From SnapTell.com we know that the people that probably influenced this app most are Gautam Bhargava (CEO), Rajeev Motwan (Stanford Professor), and G.D. Ramkumar (CTO). If you google for their publications none of them was in the computer vision area. Gautam Bhargava worked on database, Rajeev Motwan teaches theoretical computer science, and G.D. Ramkumar seems used to work on geometric algorithms. Therefore the innovative part of this app very likely lies in the database part.
From the first part of the chart, the points look very like SIFT like features. According to vision wang’s blog post he believes that the ASG algorithm (Accumulated Signed Gradient) is a SIFT like feature. According to the name, Accumulated Signed Gradient, I guess it might use several gradient values around a point and accumulate it as a vector descriptor. Or it might simply mean the algorithm accumulated several descriptor as one vector. The app works real time so I suspect anything complicated was implemented.
However according to a 2008 white paper I found on the web (see flow chart below), in addition to the image, text is also used. Text inside the polygon surrounded by the feature points could be used as an additional information for visual search.
I would say there might not be anything amazing in the part I mentioned above, the core technology should be on how they query the image database. While I am not an expert on database, I guess the database is a tree like structure, something derived from the patent, method and apparatus for classification of high dimensional data, written by G.D. Ramkumar before he co-founded SnapTell.
The problem flow is trying to solve is actually very hard and complicated, and apparently they haven’t solve it. I currently only succeeded on matching books and a coke can. For the app to work well on other non-plane objects, more efforts need to be implemented on the first step.