262 — Detecting Mutations by eBWT

Prezza et al (1805.01876)

Read on 09 May 2018
#genomics  #genome  #genetics  #strings  #SNPs  #sequencing  #NGS  #human-genome  #mutations  #BWT  #DNA 

When I took a computational genomics course with Ben Langmead at JHU, one of the most bizarre algorithms we implemented was the Burrows-Wheeler Transform, a manipulation of a string that converts it into a form with long runs of repeated characters, perfect for compressing. This effect is amplified when the characters themselves are sampled from a small alphabet (as DNA is). The most bizarre aspect of this rearrangement is that it’s reversible; and furthermore, it’s feasible to run analyses on the transformed string. (Of note is the fact that genomics is not the only field to use the BWT: The common bzip archive utility uses BWT as well.)

The authors of this paper extend the usefulness of the BWT (specifically, the extended version which works on full strings) in order to detect mutations: When run — reference-free — on human chromosome sequences, this system detected 91% of all known SNPs (accuracy=98%).

This is exciting work because it enables further analysis to be performed even on compressed sequences, which can improve the feasibility of in-place SNP / INDEL prediction as well as increase our ability to find and correct sequencing errors.