TY - GEN
T1 - We Won’t Get Fooled Again
T2 - 6th International Conference on Optimization and Learning, OLA 2023
AU - Traoré, Kalifou René
AU - Camero, Andrés
AU - Zhu, Xiao Xiang
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Hyperparameter optimization (HPO) is a well-studied research field. However, the effects and interactions of the components in an HPO pipeline are not yet well investigated. Then, we ask ourselves: Can the landscape of HPO be biased by the pipeline used to evaluate individual configurations? To address this question, we proposed to analyze the effect of the HPO pipeline on HPO problems using fitness landscape analysis. Particularly, we studied over 119 generic classification instances from either the DS-2019 (CNN) and YAHPO (XGBoost) HPO benchmark data sets, looking for patterns that could indicate evaluation pipeline malfunction, and relate them to HPO performance. Our main findings are: (i) In most instances, large groups of diverse hyperparameters (i.e., multiple configurations) yield the same ill performance, most likely associated with majority class prediction models (predictive accuracy) or models unable to attribute an appropriate class to observations (log loss); (ii) in these cases, a worsened correlation between the observed fitness and average fitness in the neighborhood is observed, potentially making harder the deployment of local-search-based HPO strategies. (iii) these effects are observed across different HPO scenarios (tuning CNN or XGBoost algorithms). Finally, we concluded that the HPO pipeline definition might negatively affect the HPO landscape.
AB - Hyperparameter optimization (HPO) is a well-studied research field. However, the effects and interactions of the components in an HPO pipeline are not yet well investigated. Then, we ask ourselves: Can the landscape of HPO be biased by the pipeline used to evaluate individual configurations? To address this question, we proposed to analyze the effect of the HPO pipeline on HPO problems using fitness landscape analysis. Particularly, we studied over 119 generic classification instances from either the DS-2019 (CNN) and YAHPO (XGBoost) HPO benchmark data sets, looking for patterns that could indicate evaluation pipeline malfunction, and relate them to HPO performance. Our main findings are: (i) In most instances, large groups of diverse hyperparameters (i.e., multiple configurations) yield the same ill performance, most likely associated with majority class prediction models (predictive accuracy) or models unable to attribute an appropriate class to observations (log loss); (ii) in these cases, a worsened correlation between the observed fitness and average fitness in the neighborhood is observed, potentially making harder the deployment of local-search-based HPO strategies. (iii) these effects are observed across different HPO scenarios (tuning CNN or XGBoost algorithms). Finally, we concluded that the HPO pipeline definition might negatively affect the HPO landscape.
KW - Benchmarking
KW - Fitness Landscape Analysis
KW - Hyperparameter Optimization
UR - https://www.scopus.com/pages/publications/85163403730
U2 - 10.1007/978-3-031-34020-8_11
DO - 10.1007/978-3-031-34020-8_11
M3 - Conference contribution
AN - SCOPUS:85163403730
SN - 9783031340192
T3 - Communications in Computer and Information Science
SP - 148
EP - 160
BT - Optimization and Learning - 6th International Conference, OLA 2023, Proceedings
A2 - Dorronsoro, Bernabé
A2 - Chicano, Francisco
A2 - Danoy, Gregoire
A2 - Talbi, El-Ghazali
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 3 May 2023 through 5 May 2023
ER -