TY - JOUR
T1 - Tagtog
T2 - Interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles
AU - Cejuela, Juan Miguel
AU - McQuilton, Peter
AU - Ponting, Laura
AU - Marygold, S. J.
AU - Stefancsik, Raymund
AU - Millburn, Gillian H.
AU - Rost, Burkhard
N1 - Publisher Copyright:
© 2014 The Author(s). Published by Oxford University Press.
PY - 2014
Y1 - 2014
N2 - The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation.
AB - The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation.
UR - http://www.scopus.com/inward/record.url?scp=84908098708&partnerID=8YFLogxK
U2 - 10.1093/database/bau033
DO - 10.1093/database/bau033
M3 - Article
C2 - 24715220
AN - SCOPUS:84908098708
SN - 1758-0463
VL - 2014
JO - Database
JF - Database
M1 - bau033
ER -