An LSTM approach to patent classification based on fixed hierarchy vectors

Marawan Shalaby, Jan Stutzki, Matthias Schubert, Stephan Günnemann

Publikation: KonferenzbeitragPapierBegutachtung

25 Zitate (Scopus)

Abstract

Recently, innovative techniques for text processing like Latent Dirichlet Allocation (LDA) and embedding algorithms like Paragraph Vectors (PV) allowed for improved text classification and retrieval methods. Even though these methods can be adjusted to handle different text collections, they do not take advantage of the fixed document structure that is mandatory in many application areas. In this paper, we focus on patent data which mandates a fixed structure. We propose a new classification method which represents documents as Fixed Hierarchy Vectors (FHV), reflecting the document's structure. FHVs represent a document on multiple levels where each level represents the complete document but with a different local context. Furthermore, we sequentialize this representation and classify documents using LSTM-based architectures. Our experiments show that FHVs provide a richer document representation and that sequential classification improves classification performance when classifying patents into the International Patent Classification (IPC) taxonomy.

OriginalspracheEnglisch
Seiten495-503
Seitenumfang9
DOIs
PublikationsstatusVeröffentlicht - 2018
Veranstaltung2018 SIAM International Conference on Data Mining, SDM 2018 - San Diego, USA/Vereinigte Staaten
Dauer: 3 Mai 20185 Mai 2018

Konferenz

Konferenz2018 SIAM International Conference on Data Mining, SDM 2018
Land/GebietUSA/Vereinigte Staaten
OrtSan Diego
Zeitraum3/05/185/05/18

Fingerprint

Untersuchen Sie die Forschungsthemen von „An LSTM approach to patent classification based on fixed hierarchy vectors“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren