More challenges for machine-learning protein interactions

Tobias Hamp, Burkhard Rost

Research output: Contribution to journalArticlepeer-review

44 Scopus citations

Abstract

Motivation: Machine learning may be the most popular computational tool in molecular biology. Providing sustained performance estimates is challenging. The standard cross-validation protocols usually fail in biology. Park and Marcotte found that even refined protocols fail for protein-protein interactions (PPIs). Results: Here, we sketch additional problems for the prediction of PPIs from sequence alone. First, it not only matters whether proteins A or B of a target interaction A-B are similar to proteins of training interactions (positives), but also whether A or B are similar to proteins of non-interactions (negatives). Second, training on multiple interaction partners per protein did not improve performance for new proteins (not used to train). In contrary, a strictly non-redundant training that ignored good data slightly improved the prediction of difficult cases. Third, which prediction method appears to be best crucially depends on the sequence similarity between the test and the training set, how many true interactions should be found and the expected ratio of negatives to positives. The correct assessment of performance is the most complicated task in the development of prediction methods. Our analyses suggest that PPIs square the challenge for this task.

Original languageEnglish
Pages (from-to)1521-1525
Number of pages5
JournalBioinformatics
Volume31
Issue number10
DOIs
StatePublished - 15 May 2015

Fingerprint

Dive into the research topics of 'More challenges for machine-learning protein interactions'. Together they form a unique fingerprint.

Cite this