Homology-based inference sets the bar high for protein function prediction

  • Background Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. Methods Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. Results and conclusions During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorousBackground Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. Methods Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. Results and conclusions During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Tobias Hamp, Rebecca Kassner, Stefan Seemayer, Esmeralda Vicedo, Christian Schaefer, Dominik Achten, Florian AuerORCiDGND, Ariane Boehm, Tatjana Braun, Maximilian Hecht, Mark Heron, Peter Hönigschmid, Thomas A. Hopf, Stefanie Kaufmann, Michael Kiening, Denis Krompass, Cedric Landerer, Yannick Mahlich, Manfred Roos, Burkhard Rost
URN:urn:nbn:de:bvb:384-opus4-1048159
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/104815
ISSN:1471-2105OPAC
Parent Title (English):BMC Bioinformatics
Publisher:Springer Science and Business Media LLC
Type:Article
Language:English
Year of first Publication:2013
Publishing Institution:Universität Augsburg
Release Date:2023/06/16
Tag:Applied Mathematics; Computer Science Applications; Molecular Biology; Biochemistry; Structural Biology
Volume:14
Issue:S3
First Page:S7
DOI:https://doi.org/10.1186/1471-2105-14-s3-s7
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für IT-Infrastrukturen für die Translationale Medizinische Forschung
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY 2.0: Creative Commons - Namensnennung (mit Print on Demand)