TY - JOUR
T1 - Decision Trees and Random Forests
T2 - Machine Learning Techniques to Classify Rare Events
AU - Hegelich, Simon
N1 - Publisher Copyright:
© 2016 Policy Studies Organization
PY - 2016/3/1
Y1 - 2016/3/1
N2 - The article introduces machine learning algorithms for political scientists. These approaches should not be seen as a new method for old problems. Rather, it is important to understand the different logic of the machine learning approach. Here, data is analyzed without theoretical assumptions about possible causalities. Models are optimized according to their accuracy and robustness. While the computer can do this work more or less alone, it is the researcher's duty to make sense of these models afterward. Visualization of machine learning results, therefore, becomes very important and is in the focus of this paper. The methods that are presented and compared are decision trees, bagging, and random forests. The latter are more advanced versions of the former, relying on bootstrapping procedures. To demonstrate these methods, extreme shifts in the US budget and their connection to the attention of political actors are analyzed. The paper presents a comparison of the accuracy of different models based on ROC curves and shows how to interpret random forest models with the help of visualizations. The aim of the paper is to provide an example, how these methods can be used in political science and to highlight possible pitfalls as well as advantages of machine learning.
AB - The article introduces machine learning algorithms for political scientists. These approaches should not be seen as a new method for old problems. Rather, it is important to understand the different logic of the machine learning approach. Here, data is analyzed without theoretical assumptions about possible causalities. Models are optimized according to their accuracy and robustness. While the computer can do this work more or less alone, it is the researcher's duty to make sense of these models afterward. Visualization of machine learning results, therefore, becomes very important and is in the focus of this paper. The methods that are presented and compared are decision trees, bagging, and random forests. The latter are more advanced versions of the former, relying on bootstrapping procedures. To demonstrate these methods, extreme shifts in the US budget and their connection to the attention of political actors are analyzed. The paper presents a comparison of the accuracy of different models based on ROC curves and shows how to interpret random forest models with the help of visualizations. The aim of the paper is to provide an example, how these methods can be used in political science and to highlight possible pitfalls as well as advantages of machine learning.
KW - Machine learning
KW - methods
KW - punctuated equilibrium
KW - statistics for the 21st century
UR - http://www.scopus.com/inward/record.url?scp=85020002063&partnerID=8YFLogxK
U2 - 10.18278/epa.2.1.7
DO - 10.18278/epa.2.1.7
M3 - Article
AN - SCOPUS:85020002063
SN - 2380-6567
VL - 2
SP - 98
EP - 120
JO - European Policy Analysis
JF - European Policy Analysis
IS - 1
ER -