Alignments grow, secondary structure prediction improves

Dariusz Przybylski, Burkhard Rost

Research output: Contribution to journalArticlepeer-review

164 Scopus citations

Abstract

Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Recently, various groups have shown that accuracy can be improved significantly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year-old PHD methods. The following results stood out. (i) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states: helix, strand, and other. Using larger databases and PSI-BLAST raised accuracy to 75%. (ii) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, and gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (iii) It is of interest that we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (iv) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth.

Original languageEnglish
Pages (from-to)197-205
Number of pages9
JournalProteins: Structure, Function and Bioinformatics
Volume46
Issue number2
DOIs
StatePublished - 1 Feb 2002
Externally publishedYes

Keywords

  • Dynamic programming
  • Evolutionary information
  • Neural networks
  • PSI-BLAST
  • Profiles-based multiple alignments
  • Protein structure prediction
  • Solvent accessibility

Fingerprint

Dive into the research topics of 'Alignments grow, secondary structure prediction improves'. Together they form a unique fingerprint.

Cite this