Recognition of distantly related protein sequences using conserved motifs and neural networks

Dmitrij Frishman, Patrick Argos

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

A sensitive technique for protein sequence motif recognition based on neural networks has been developed. It involves three major steps. (1) At each appropriate alignment position of a set of N matched sequences, a set of N aligned oligopeptides is specified with preselected window length. N neural nets are subsequently and successively trained on N-1 amino acid spans after eliminating each ith oligopeptide. A test for recognition of each of the ith spans is performed. The average neural net recognition over N such trials is used as a measure of conservation for the particular windowed region of the multiple alignment. This process is repeated for all possible spans of given length in the multiple alignment. (2) The M most conserved regions are regarded as motifs and the oligopeptides within each are used to train intensively M individual neural networks. (3) The M networks are then applied in a search for related primary structures in a databank of known protein sequences. The oligopeptide spans in the database sequence with strongest neural net output for each of the M networks are saved and then scored according to the output signals and the proper combination that follows the expected N- to C-terminal sequence order. The motifs from the database with highest similarity scores can then be used to retrain the M neural nets, which can be subsequently utilized for further searches in the databank, thus providing even greater sensitivity to recognize distant familial proteins. This technique was successfully applied to the integrase, DNA-polymerase and immunoglobulin families.

Original languageEnglish
Pages (from-to)951-962
Number of pages12
JournalJournal of Molecular Biology
Volume228
Issue number3
DOIs
StatePublished - 5 Dec 1992
Externally publishedYes

Keywords

  • conserved sequence patterns
  • neural networks
  • sequence comparison
  • sequence motifs

Fingerprint

Dive into the research topics of 'Recognition of distantly related protein sequences using conserved motifs and neural networks'. Together they form a unique fingerprint.

Cite this