189 — Image2Mesh: A Learning Framework for Single Image 3D Reconstruction

Pontes et al (1711.10669)

Read on 25 February 2018
#mesh  #image  #3D  #reconstruction  #CV  #scene-reconstruction  #neural-network  #computer-vision 

One issue with machine learning in the context of 3D graphics is that 3D data are notoriously complex data structures, often essentially representing both a large knowledge base as well as a very deliberate 3D embedding, all at the same time. Because the data are so complex and structured, feeding a dataset into a neural network is onerous for the developer and the network, which results in more brittle software and slower networks.

The authors of this paper propose a new strategy: Instead of capturing the point cloud in its native form, instead try to infer a “compact mesh representation” which converts the data into an embeddable 3D graph representation, which can be easily ingested by a neural network. Perhaps even more importantly, this technique is applicable to single images.

Single-image 3D reconstruction is the Holy Grail of 3D scene interpretation, but it is very very hard.

Perhaps the only feasible way to currently extract objects from a scene in 3D is to first classify what objects are visible, and then, using an enormous collection of 3D models of different objects, most closely approximate the scene using minimally modified versions of those objects.

The major shortcoming I see with this strategy is that it prevents the model from being able to understand scenes that contain previously-unseen objects, or for which a corpus 3D models do not already exist.

But I don’t see any way around this for now, and it certainly improves scene reconstruction ability to have this capability.