Decision Trees and Random Forests: Machine Learning Techniques to Classify Rare Events

Research output: Contribution to journalArticlepeer-review

37 Scopus citations

Abstract

The article introduces machine learning algorithms for political scientists. These approaches should not be seen as a new method for old problems. Rather, it is important to understand the different logic of the machine learning approach. Here, data is analyzed without theoretical assumptions about possible causalities. Models are optimized according to their accuracy and robustness. While the computer can do this work more or less alone, it is the researcher's duty to make sense of these models afterward. Visualization of machine learning results, therefore, becomes very important and is in the focus of this paper. The methods that are presented and compared are decision trees, bagging, and random forests. The latter are more advanced versions of the former, relying on bootstrapping procedures. To demonstrate these methods, extreme shifts in the US budget and their connection to the attention of political actors are analyzed. The paper presents a comparison of the accuracy of different models based on ROC curves and shows how to interpret random forest models with the help of visualizations. The aim of the paper is to provide an example, how these methods can be used in political science and to highlight possible pitfalls as well as advantages of machine learning.

Original languageEnglish
Pages (from-to)98-120
Number of pages23
JournalEuropean Policy Analysis
Volume2
Issue number1
DOIs
StatePublished - 1 Mar 2016

Keywords

  • Machine learning
  • methods
  • punctuated equilibrium
  • statistics for the 21st century

Fingerprint

Dive into the research topics of 'Decision Trees and Random Forests: Machine Learning Techniques to Classify Rare Events'. Together they form a unique fingerprint.

Cite this