by Gooly (Li Yang Ku)
If you can afford 3D why 2D? This is probably true for both movie and object recognition. With 3D sensors you get a lot of advantages in object recognition, and thanks to XBox360, 3D sensors are now affordable.
Object Recognition in 3D easily avoids lots of segmentation problems in 2D, where occlusions only happen in certain angle. It also provides 3D shape information which differentiates a cup and a picture of a cup. You can easily combine 3D and 2D, making 3D object recognition always superior then 2D recognition.
The Point Cloud Library (PCL) is the place to start. A naive method would be to first cluster points using Euclidean Cluster Extraction
http://www.pointclouds.org/documentation/tutorials/cluster_extraction.php . This gives you a clean 3D segmentation. Note that points in a point cloud has the same order as the rgb image so you would be able to segment your rgb image according to that.
Then you can collect any 2D features such as SIFT, SURF, HOG or whatever based on the segmented region.
Further more, you can collect 3D features such as
PFH http://pointclouds.org/documentation/tutorials/pfh_estimation.php ,
or 3D SIFT from each segmented point cloud.
By combining all the features in an invariant way, (invariant to the density of cloud points, SURF or SIFT points. Could probably use histogram to achieve that). You can acquire a high dimensional vector representing these object.
While testing, a nearest neighbor search to these vectors should then indicate the most likely object.
One more good thing about PCL is that it’s under BSD License so you can extend it to commercial product for free. ooh la la