TY - GEN
T1 - YAWN
T2 - 12th Symposium of the German Informatics Society Section "Databases and Information Systems" (DBIS) on Database Systems in Business, Technology and Web, BTW 2007
AU - Schenkel, Ralf
AU - Suchanek, Fabian
AU - Kasneci, Gjergji
PY - 2007
Y1 - 2007
N2 - The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.
AB - The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.
UR - http://www.scopus.com/inward/record.url?scp=84873920881&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84873920881
SN - 9783885791973
T3 - Datenbanksysteme in Business, Technologie und Web, BTW 2007 - 12th Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings
SP - 277
EP - 291
BT - Datenbanksysteme in Business, Technologie und Web, BTW 2007 - 12th Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Proceedings
Y2 - 7 March 2007 through 9 March 2007
ER -