Wednesday, April 26, 2017

Solving the Wrong Problem: My Take on 3DOR 2017

This week I attended the 10th 3D Object Recognition workshop, which was held as preliminary part of EuroGraphics. To sum up my impression from the vast majority of the works, I felt that the academia is working on the wrong problems.

Specifically - much of the research presented (though not all) dealt with CAD models, indexing them, analyzing them, finding similarities, descriptors, and transformations for them. This is, in my opinion, an approach that has lost touch with the actual state of data in the real world, and is driven mostly by cultural reasons such as the existence of previous work to rely on and CAD-based benchmarks to run. 

Scanned and CAD chair (not the same chair, of course)

In the past two years, we have seen an increasing number of 3D scanners and depth cameras, ranging from low quality devices - costing just a few hundreds of dollars - to expensive, high resolution, industrial devices, that produce impressive results. The tables have turned, and scanned data, which was a small minority, can now be easily generated in large quantities. This data ranges from lab-scanned toys, to huge urban scenes scanned by drones, nevertheless, the majority of the algorithmic works presented in 3DOR dealt with CAD models. I believe that there are two main reasons for that:

  1. There are almost no tagged scanned data sets, and none of substantial size (SceneNN, perhaps the best work so far, has only 100 scenes).
  2.  Most existing methods are tailored to CAD models, and are thus sensitive to one or all of the following traits, common in scanned data:
    1. High-frequency noise due to scanner accuracy issues.
    2. Incomplete models due to occlusion or simply unscanned sides of the object.
    3. Holes.
    4. Open boundaries.
Reason (1) becomes even more of an issue to anyone wishing to follow the deep learning trend, as it - as most ML approaches - requires tagged data sets of considerable size. 

A scanned scene from SceneNN data set

I believe that there are two types of data the academia should address seriously in the upcoming year or two - scanned small scale data, such as singular models and indoor scenes, and scanned large scale data, such as entire buildings, and streets. With the upcoming depth camera in iPhone8, we can expect a proliferation of the former, while the latter will be the result of increasing use of industrial scanners in security, construction, and drones.

In my experience, algorithms that worked well with CAD models, more often than not prove to be useless for scanned data. This is bad news for reusing much of previous work as-is, but is also good news, as it brings hope that we'll see exciting new approaches to 3D retrieval and recognition in upcoming years, that will truly have an effect on technology being developed, and thus on people's lives.