229 — Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs

Esteban, Hyland, & Rätsch (1706.02633)

Read on 06 April 2018
#medical-records  #EHR  #synthetic-data  #patients  #neural-network  #GAN  #time-series  #rcGAN 

GANs are conventionally used to generate realistic-looking images — and they’re particularly useful when a classifier can be easily made but a generator is more nuanced.

This research looks into the use of GANs to generate synthetic time-series medical data that track a set of patients over time. This work is notoriously hard because finding available training data is already a huge challenge.

This system works by taking a random-noise signal at every point in a time series and producing a synthetic signal, which it passes to the discriminator which makes a classification for each timestep.

In order to simplify the logistics of data availability, the researchers propose a “Train on Synthetic / Test on Real” (TSTR) method that uses false data for training but then tests on a real dataset.

To confirm that their GAN wasn’t simply “memorizing” training data from the ICU dataset, the researchers looked at the “distance” between a few samples of the generated population and the nearest neighbor in the original data, both in the final output vectorspace as well as in the “back-projected” latent space.

This work looks very interesting to me, because even if the synthetic data itself is not useful, it means that it is possible to use the output of the GAN for algorithm generation, and then test (much like the TSTR approach) on true data once a technique is vetted.