TY - GEN
T1 - Efficient and accurate clustering for large-scale genetic mapping
AU - Strnadová, Veronika
AU - Buluc, Aydin
AU - Chapman, Jarrod
AU - Gilbert, John R.
AU - Gonzalez, Joseph
AU - Jegelka, Stefanie
AU - Rokhsar, Daniel
AU - Oliker, Leonid
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/29
Y1 - 2014/12/29
N2 - High-throughput 'next generation' genome sequencing technologies are producing a flood of inexpensive genetic information that is invaluable to genomics research. Sequences of millions of genetic markers are being produced, providing genomics researchers with the opportunity to construct highresolution genetic maps for many complicated genomes. However, the current generation of genetic mapping tools were designed for the small data setting, and are now limited by the prohibitively slow clustering algorithms they employ in the genetic marker-clustering stage. In this work, we present a new approach to genetic mapping based on a fast clustering algorithm that exploits the geometry of the data. Our theoretical and empirical analysis shows that the algorithm can correctly recover linkage groups. Using synthetic and real-world data, including the grand-challenge wheat genome, we demonstrate that our approach can quickly process orders of magnitude more genetic markers than existing tools while retaining - and in some cases even improving - the quality of genetic marker clusters.
AB - High-throughput 'next generation' genome sequencing technologies are producing a flood of inexpensive genetic information that is invaluable to genomics research. Sequences of millions of genetic markers are being produced, providing genomics researchers with the opportunity to construct highresolution genetic maps for many complicated genomes. However, the current generation of genetic mapping tools were designed for the small data setting, and are now limited by the prohibitively slow clustering algorithms they employ in the genetic marker-clustering stage. In this work, we present a new approach to genetic mapping based on a fast clustering algorithm that exploits the geometry of the data. Our theoretical and empirical analysis shows that the algorithm can correctly recover linkage groups. Using synthetic and real-world data, including the grand-challenge wheat genome, we demonstrate that our approach can quickly process orders of magnitude more genetic markers than existing tools while retaining - and in some cases even improving - the quality of genetic marker clusters.
UR - http://www.scopus.com/inward/record.url?scp=84922773775&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2014.6999119
DO - 10.1109/BIBM.2014.6999119
M3 - Conference contribution
AN - SCOPUS:84922773775
T3 - Proceedings - 2014 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2014
SP - 3
EP - 10
BT - Proceedings - 2014 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2014
A2 - Zheng, Huiru
A2 - Hu, Xiaohua Tony
A2 - Berrar, Daniel
A2 - Wang, Yadong
A2 - Dubitzky, Werner
A2 - Hao, Jin-Kao
A2 - Cho, Kwang-Hyun
A2 - Gilbert, David
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2014
Y2 - 2 November 2014 through 5 November 2014
ER -