TY - CHAP
T1 - Analyzing Text in Software Projects
AU - Wagner, Stefan
AU - Fernández, Daniel Méndez
N1 - Publisher Copyright:
© 2015 Elsevier Inc. All rights reserved.
PY - 2015/9/1
Y1 - 2015/9/1
N2 - Most of the data produced in software projects is of textual nature: source code, specifications, or documentation. The advances in quantitative analysis methods drove a lot of data analytics in software engineering. This has overshadowed to some degree the importance of texts and their qualitative analysis. Such analysis has, however, merits for researchers and practitioners as well.In this chapter, we describe the basics of analyzing text in software projects. We first describe how to manually analyze and code textual data. Next, we give an overview of mixed methods for automatic text analysis, including n-grams and clone detection, as well as more sophisticated natural language processing identifying syntax and contexts of words. Those methods and tools are of critical importance to aid in the challenges associated with today's huge amounts of textual data.We illustrate the methods introduced via a running example and conclude by presenting two industrial studies.
AB - Most of the data produced in software projects is of textual nature: source code, specifications, or documentation. The advances in quantitative analysis methods drove a lot of data analytics in software engineering. This has overshadowed to some degree the importance of texts and their qualitative analysis. Such analysis has, however, merits for researchers and practitioners as well.In this chapter, we describe the basics of analyzing text in software projects. We first describe how to manually analyze and code textual data. Next, we give an overview of mixed methods for automatic text analysis, including n-grams and clone detection, as well as more sophisticated natural language processing identifying syntax and contexts of words. Those methods and tools are of critical importance to aid in the challenges associated with today's huge amounts of textual data.We illustrate the methods introduced via a running example and conclude by presenting two industrial studies.
KW - Automated analysis
KW - Manual coding
KW - Qualitative analysis
KW - Text analytics
UR - http://www.scopus.com/inward/record.url?scp=84944062975&partnerID=8YFLogxK
U2 - 10.1016/B978-0-12-411519-4.00003-3
DO - 10.1016/B978-0-12-411519-4.00003-3
M3 - Chapter
AN - SCOPUS:84944062975
SN - 9780124115194
SP - 39
EP - 72
BT - The Art and Science of Analyzing Software Data
PB - Elsevier Inc.
ER -