Analyzing taxonomic classification using extensible Markov models

Rao M. Kotamarti, Michael Hahsler, Douglas Raiford, Monnie McGee, Margaret H. Dunham, Burkhard Rost

Research output: Contribution to journalConference articlepeer-review

Abstract

Motivation: As next generation sequencing is rapidly adding new genomes, their correct placement in the taxonomy needs verification. However, the current methods for confirming classification of a taxon or suggesting revision for a potential misplacement relies on computationally intense multi-sequence alignment followed by an iterative adjustment of the distance matrix. Due to intra-heterogeneity issues with the 16S rRNA marker, no classifier is available for sub-genus level, which could readily suggest a classification for a novel 16S rRNA sequence. Metagenomics further complicates the issue by generating fragmented 16S rRNA sequences. This article proposes a novel alignment-free method for representing the microbial profiles using extensible Markov models (EMMs) with an extended Karlin-Altschul statistical framework similar to the classic alignment paradigm. We propose a log odds (LODs) score classifier based on Gumbel difference distribution that confirms correct classifications with statistical significance qualifications and suggests revisions where necessary. Results: We tested our method by generating a sub-genus level classifier with which we re-evaluated classifications of 676 microbial organisms using the NCBI FTP database for the 16S rRNA. The results confirm current classification for all genera while ascertaining significance at 95%. Furthermore, this novel classifier isolates heterogeneity issues to a mere 12 strains while confirming classifications with significance qualification for the remaining 98%. The models require less memory than that needed by multisequence alignments and have better time complexity than the current methods. The classifier operates at sub-genus level, and thus outperforms the naive Bayes classifier of the RNA Database Project where much of the taxonomic analysis is available online. Finally, using information redundancy in model building, we show that the method applies to metagenomic fragment classification of 19 Escherichia coli strains.

Original languageEnglish
Pages (from-to)2235-2241
Number of pages7
JournalBioinformatics
Volume27
Issue number13
DOIs
StatePublished - 2011
Event19th Annual International Conference on Intelligent Systems for Molecular Biology, Joint with the 10th European Conference on Computational Biology, ISMB/ECCB 2011 - Vienna, Austria
Duration: 17 Jul 201119 Jul 2011

Fingerprint

Dive into the research topics of 'Analyzing taxonomic classification using extensible Markov models'. Together they form a unique fingerprint.

Cite this