Towards BilingualWord Embedding Models for Engineering: Evaluating Semantic Linking Capabilities of Engineering-Specific Word Embeddings Across Languages

Tim Schopf, Peter Weinberger, Thomas Kinkeldei, Florian Matthes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Word embeddings represent the semantic meanings of words in high-dimensional vector space. Because of this capability, word embeddings could be used in a wide range of Natural Language Processing (NLP) tasks. While domain-specific monolingual word embeddings are common in literature, domain-specific bilingual word embeddings are uncommon. In general, large text corpora are required for training high quality word embeddings. Furthermore, training domain-specific word embeddings necessitates the use of source texts from the relevant domain. To train bilingual domain-specific word embeddings, the domain-specific texts must also be available in two different languages. In this paper, we use a large dataset of engineering-related articles in German and English to train bilingual engineering-specific word embedding models using different approaches. We will evaluate our trained models, identify the most promising approach, and demonstrate that the best performing one is very capable of representing semantic relationships between engineering-specific words and mapping languages in a shared vector space. Moreover, we show that the additional use of an engineering-specific learning dictionary can improve the quality of bilingual engineering-specific word embeddings.

Original languageEnglish
Title of host publicationMSIE 2022 - 2022 4th International Conference on Management Science and Industrial Engineering
PublisherAssociation for Computing Machinery
Pages407-413
Number of pages7
ISBN (Electronic)9781450395816
DOIs
StatePublished - 28 Apr 2022
Event4th International Conference on Management Science and Industrial Engineering, MSIE 2022 - Virtual, Online, Thailand
Duration: 28 Apr 202230 Apr 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference4th International Conference on Management Science and Industrial Engineering, MSIE 2022
Country/TerritoryThailand
CityVirtual, Online
Period28/04/2230/04/22

Keywords

  • Bilingual Word Embeddings
  • Engineering
  • Natural Language Processing

Fingerprint

Dive into the research topics of 'Towards BilingualWord Embedding Models for Engineering: Evaluating Semantic Linking Capabilities of Engineering-Specific Word Embeddings Across Languages'. Together they form a unique fingerprint.

Cite this