Is perfect genome assembly possible? Yes, says Gene Myers.

According to Gene Myers (near) perfect genome assembly is within reach for any organism of your choice.

Time will tell if he’s right, but being an influential bioinformatician who has made key contributions in sequence comparison algorithms such as BLAST, whole-genome shotgun sequencing and genome assembling, one will think he knows what he’s talking about!

shotgun sequencing
Shotgun sequencing, Adapted from Commins, J., Toft, C., Fares, M. A. – “Computational Biology Methods and Their Application to the Comparative Genomics of Endocellular Symbiotic Bacteria of Insects.” Biol. Procedures Online (2009). Accessed via SpringerImages. CC BY-SA 2.5

In a conference at the PRBB auditorium today, he explained to a mixed audience of biologists and computer scientists how, after a few years dedicated to other issues (mostly image analysis), he was now coming back to sequencing with great excitement. The reason: PacBio RSII. This sequencing device is able to produce very long reads (of more than 10,000bp!) and has a couple of other characteristics that can potentially make full assembly possible: although error rates are high (10-15%) they are random, not like with other techniques that tend to make always the same errors. And sampling is also random. This randomness and the length of the reads mean that, with enough sequencing coverage, you can always get the right sequences.

So now all we need, Myers says – apart from waiting for the cost of the PacBio to go down, which he promised will happen soon (4x in one year) – is to build an efficient assembler. He talked about what he and some colleagues have been doing in that sense. The main element is a ‘scrubber’ to clear and edit the reads while removing as little data as possible. Because his point was that even though people have been focusing on the assembly, the real problem is the data, the contaminants, chimeras, excessive error rates,… So he presented his personal ‘data cleaner’, DAscrub, soon to be released.

You can read more details about his recent work on this in his blog,

In the meantime, his advice to the world – stop the 10,000 genomes project right away and wait a couple of years to have better sequences!


