176 — Crowdsourcing biomedical research: leveraging communities as innovation engines

Saez-Rodriguez et al (10.1038/nrg.2016.69)

Read on 12 February 2018
#bioinformatics  #genetics  #genomics  #big-data  #challenges  #crowd-sourcing 

As biomedical datasets continue to grow in size, the number of opportunities for collaboration between groups grows exponentially. That’s not really an informed opinion, I’m just making that up. I have no idea what the relationship is between data growth and opportunity growth, and actually now that I think about it, it almost certainly isn’t exponential. That’s a really serious claim.

I’m a bit offended this paper — in its long list of ‘omics — failed to mention connectomics. But I digress. We can’t have everything.

This paper explores the growing ecosystem of data science challenges, or programs in which a public or semi-public dataset is made available for the general public to access and analyze. These challenges often enable a group with limited resources or time to pose a complex question and receive an answer for “free” (though prizes are often awarded to well-performing contestants). One particularly well-known platform, Kaggle, simplifies the process of running these challenges, and helps enable sponsored prizes.

The fields of biometical research are ripe for public data science challenges because there are so many datasets with so much data, and the resources rarely exist in clinical or small research-lab settings to actually leverage these datasets.

This review places Kaggle as a separate platform to the current (2016) state of the biomedical data science community, but this has largely changed in recent years (and other sites have sprung up to accomodate specific needs of biodata datasets).