Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning

Suyu Ma, Zhenchang Xing, Chunyang Chen, Cheng Chen, Lizhen Qu, Guoqiang Li

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

Application Programming Interfaces (APIs) have been widely discussed on social-technical platforms (e.g., Stack Overflow). Extracting API mentions from such informal software texts is the prerequisite for API-centric search and summarization of programming knowledge. Machine learning based API extraction has demonstrated superior performance than rule-based methods in informal software texts that lack consistent writing forms and annotations. However, machine learning based methods have a significant overhead in preparing training data and effective features. In this paper, we propose a multi-layer neural network based architecture for API extraction. Our architecture automatically learns character-, word- and sentence-level features from the input texts, thus removing the need for manual feature engineering and the dependence on advanced features (e.g., API gazetteers) beyond the input texts. We also propose to adopt transfer learning to adapt a source-library-trained model to a target-library, thus reducing the overhead of manual training-data labeling when the software text of multiple programming languages and libraries need to be processed. We conduct extensive experiments with six libraries of four programming languages which support diverse functionalities and have different API-naming and API-mention characteristics. Our experiments investigate the performance of our neural architecture for API extraction in informal software texts, the importance of different features, the effectiveness of transfer learning. Our results confirm not only the superior performance of our neural architecture than existing machine learning based methods for API extraction in informal software texts, but also the easy-to-deploy characteristic of our neural architecture.

Original languageEnglish
Pages (from-to)2296-2311
Number of pages16
JournalIEEE Transactions on Software Engineering
Volume47
Issue number10
DOIs
StatePublished - 1 Oct 2021
Externally publishedYes

Keywords

  • API extraction
  • CNN
  • LSTM
  • transfer learning
  • word embedding

Fingerprint

Dive into the research topics of 'Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning'. Together they form a unique fingerprint.

Cite this