An LSTM approach to patent classification based on fixed hierarchy vectors

Marawan Shalaby, Jan Stutzki, Matthias Schubert, Stephan Günnemann

Research output: Contribution to conferencePaperpeer-review

25 Scopus citations

Abstract

Recently, innovative techniques for text processing like Latent Dirichlet Allocation (LDA) and embedding algorithms like Paragraph Vectors (PV) allowed for improved text classification and retrieval methods. Even though these methods can be adjusted to handle different text collections, they do not take advantage of the fixed document structure that is mandatory in many application areas. In this paper, we focus on patent data which mandates a fixed structure. We propose a new classification method which represents documents as Fixed Hierarchy Vectors (FHV), reflecting the document's structure. FHVs represent a document on multiple levels where each level represents the complete document but with a different local context. Furthermore, we sequentialize this representation and classify documents using LSTM-based architectures. Our experiments show that FHVs provide a richer document representation and that sequential classification improves classification performance when classifying patents into the International Patent Classification (IPC) taxonomy.

Original languageEnglish
Pages495-503
Number of pages9
DOIs
StatePublished - 2018
Event2018 SIAM International Conference on Data Mining, SDM 2018 - San Diego, United States
Duration: 3 May 20185 May 2018

Conference

Conference2018 SIAM International Conference on Data Mining, SDM 2018
Country/TerritoryUnited States
CitySan Diego
Period3/05/185/05/18

Keywords

  • LSTM
  • Patent classification
  • Word embedding
  • Word2vec

Fingerprint

Dive into the research topics of 'An LSTM approach to patent classification based on fixed hierarchy vectors'. Together they form a unique fingerprint.

Cite this