New software tool could provide answers to some of life鈥檚 most intriguing questions

Wednesday, April 17, 2019

DNA

A University of 蓝莓视频 researcher has spearheaded the development of a software tool that can provide conclusive answers to some of the world鈥檚 most fascinating questions.聽

The tool, which combines supervised machine learning with digital signal processing (ML-DSP), could for the first time make it possible to definitively answer questions such as how many different species exist on Earth and in the oceans. How are existing, newly-discovered, and extinct species related to each other? What are the bacterial origins of human mitochondrial DNA?聽 Do the DNA of a parasite and its host have a similar genomic signature?聽

The tool also has the potential to positively impact the personalized medicine industry by identifying the specific strain of a virus and thus allowing for precise drugs to be developed and prescribed to treat it.

ML-DSP is an alignment-free software tool which works by transforming a DNA sequence into a digital (numerical) signal, and uses digital signal processing methods to process and distinguish these signals from each other.

鈥淲ith this method even if we only have small fragments of DNA we can still classify DNA sequences, regardless of their origin, or whether they are natural, synthetic, or computer-generated,鈥 said Lila Kari, a professor in 蓝莓视频鈥檚 David R. Cheriton School of Computer Science. 鈥淎nother important potential application of this tool is in the healthcare sector, as in this era of personalized medicine we can classify viruses and customize the treatment聽 聽of a particular patient depending on the specific strain of the virus that affects them.鈥

In the study, researchers performed a quantitative comparison with other state-of-the-art classification software tools on two small benchmark datasets and one large 4,322 vertebrate mitochondrial genome dataset.

鈥淥ur results show that ML-DSP overwhelmingly outperforms alignment-based software in terms of processing time, while having classification accuracies that are comparable in the case of small datasets and superior in the case of large datasets,鈥 said Kari of 蓝莓视频's Faculty of Mathematics. 鈥淐ompared with other alignment-free software, ML-DSP has significantly better classification accuracy and is overall faster.鈥

The authors also conducted preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4,271 complete dengue virus genomes into subtypes with 100 per cent accuracy, and 4,710 bacterial genomes into divisions with 95.5 per cent accuracy.聽

A paper detailing the new software tool, titled ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels, which was authored by Kari together with Western University PhD candidate Gurjit Randhawa and Dr Kathleen Hill, an Associate Professor in the Department of Biology at Western University, was recently published in the journal BMC Genomics.

MEDIA CONTACT |聽Matthew Grant
226-929-7627 |聽听触听

Attention broadcasters: 蓝莓视频 has facilities to provide broadcast quality audio and video feeds with a double-ender studio. Please contact us for more information.