302 — Neural scene representation and rendering

Eslami & Rezende et al (10.1126/science.aar6170)

Read on 18 June 2018
#neural-network  #machine-learning  #scene-representation  #rendering  #visual-system  #vision  #group:deepmind 

DeepMind published this exciting work which generates a scene representation from simple static imagery. Much like a young animal (or human!), this network begins with no understanding of how the world works or is organized; it has no understanding of photon trajectory or raytracing. Instead, it’s given a

This system is dubbed a Generative Query Network (GQN). GQNs are comprised of two components — a representation net and a generation net.

The representation network is in charge of converting an image into a scene representation — a la converting a photograph into a 3D scene — and then the generative component re-renders that scene from a different perspective. This means that this network can render a scene in 2D from a different angle as its original photograph — given enough training data.

The animations (link below) of a camera moving through this space are really interesting, because the system pretty consistently learns the shape of simple geometric objects, and projects them accurately into the new rendering, even though object point, position, rotation, and color aren’t explicitly covered by the network’s training data.

You can read more about this work on DeepMind’s blog.