Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62750

Discussion: Google's Scalable Deep Learning + 1000 Genome Project?

$
0
0

Should google be applying their recently devised scalable deep learning approach to freely available human genome data?

Some thoughts/links/info...

The 1000 genome project has already released data for 1700 genomes, weighing in at 200TB. We know the 3 billion base pairs of a human can be represented directly (uncompressed) in 750MB. 750MB * 1700 sequences = 1.275 TB. One suggestion I've encountered is that this is raw sequencing output that needs splicing together, if you do that and are happy with that one interpretation for each genome we're perhaps looking at 1.275TB instead of 200 - a good start at making this data more manageable.

1.275TB will fit on a single HD. Furthermore it's not unreasonable to conceive of a single PC box with say 2TB of RAM. However...

Google already perform deep learning on large data sets, see Scaling Deep Learning, Jeff Dean, Google and Peter Norvig: Channeling the Flood of Data. tl;dr - they partition the data between machines within a data centre and communicate connection weight deltas between machines.

It seems to me that deep learning may be able to discover deeper structures in the genetic sequences than are currently known, and that we might be able to find correlations between these deep structures and phenotype level features. Would this work? Is anyone aware of such a project? Thanks for reading.

submitted by locster
[link][19 comments]

Viewing all articles
Browse latest Browse all 62750

Trending Articles