TY - JOUR
T1 - Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.)
AU - Kota, R.
AU - Rudd, S.
AU - Facius, A.
AU - Kolesov, G.
AU - Thiel, T.
AU - Zhang, H.
AU - Stein, N.
AU - Mayer, K.
AU - Graner, A.
N1 - Funding Information:
Acknowledgements We are grateful to Patrick Hayes for providing the DH lines of the barley mapping populations Steptoe · Morex and Oregon Wolfe Dom · Oregon Wolfe Rec. The technical assistance of Ulrike Beier is gratefully acknowledged. This work was funded by the German Federal Ministry of Education and Research in conjunction with the GABI program (BMBF Grants 0312270/4, 0312271A and 0312278C).
PY - 2003/10/1
Y1 - 2003/10/1
N2 - The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271, 630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity (π) values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.
AB - The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271, 630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity (π) values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.
KW - Bioinfomatics
KW - Data mining
KW - Denaturing high-performance liquid chromatography (DHPLC)
KW - Expressed sequence tags (ESTs)
KW - Single-nucleotide polymorphisms (SNPs)
UR - http://www.scopus.com/inward/record.url?scp=0142107475&partnerID=8YFLogxK
U2 - 10.1007/s00438-003-0891-6
DO - 10.1007/s00438-003-0891-6
M3 - Article
C2 - 12938038
AN - SCOPUS:0142107475
SN - 1617-4615
VL - 270
SP - 24
EP - 33
JO - Molecular Genetics and Genomics
JF - Molecular Genetics and Genomics
IS - 1
ER -