TY - JOUR
T1 - Variational Learning is Effective for Large Deep Networks
AU - Shen, Yuesong
AU - Daheim, Nico
AU - Cong, Bai
AU - Nickl, Peter
AU - Marconi, Gian Maria
AU - Bazan, Clement
AU - Yokota, Rio
AU - Gurevych, Iryna
AU - Cremers, Daniel
AU - Khan, Mohammad Emtiyaz
AU - Möllenhoff, Thomas
N1 - Publisher Copyright:
Copyright 2024 by the author(s)
PY - 2024
Y1 - 2024
N2 - We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https://github.com/team-approx-bayes/ivon.
AB - We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https://github.com/team-approx-bayes/ivon.
UR - http://www.scopus.com/inward/record.url?scp=85203837184&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85203837184
SN - 2640-3498
VL - 235
SP - 44665
EP - 44686
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 41st International Conference on Machine Learning, ICML 2024
Y2 - 21 July 2024 through 27 July 2024
ER -