TY - JOUR
T1 - Exploiting Benford’s Law for Weight Regularization of Deep Neural Networks
AU - Ott, Julius
AU - Sun, Huawei
AU - Rinaldi, Enrico
AU - Mauro, Gianfranco
AU - Servadei, Lorenzo
AU - Wille, Robert
N1 - Publisher Copyright:
© 2025, Transactions on Machine Learning Research. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Stochastic learning of Deep Neural Network (DNN) parameters is highly sensitive to training strategy, hyperparameters, and available training data. Many state-of-the-art solutions use weight regularization to adjust parameter distributions, prevent overfitting, and support generalization of DNNs. None of the existing regularization techniques have ever exploited a typical distribution of numerical datasets with respect to the first non-zero (or significant) digit, called Benford’s Law (BL). In this paper, we show that the deviation of the significant digit distribution of the DNN weights from BL is closely related to the generalization of the DNN. In particular, when the DNN is presented with limited training data. To take advantage of this finding, we use BL to target the weight regularization of DNNs. Extensive experiments are performed on image, tabular, and speech data, considering convolutional (CNN) and Transformer-based neural network architectures with varying numbers of parameters. We show that the performance of DNNs is improved by minimizing the distance between the significant digit distributions of the DNN weights and the BL distribution along with L2 regularization. The improvements depend on the network architecture and how it deals with limited data. However, the proposed penalty term improves consistently and some CNN-based architectures gain up to 15% test accuracy over the default training scheme with L2 regularization on subsets of CIFAR 100.
AB - Stochastic learning of Deep Neural Network (DNN) parameters is highly sensitive to training strategy, hyperparameters, and available training data. Many state-of-the-art solutions use weight regularization to adjust parameter distributions, prevent overfitting, and support generalization of DNNs. None of the existing regularization techniques have ever exploited a typical distribution of numerical datasets with respect to the first non-zero (or significant) digit, called Benford’s Law (BL). In this paper, we show that the deviation of the significant digit distribution of the DNN weights from BL is closely related to the generalization of the DNN. In particular, when the DNN is presented with limited training data. To take advantage of this finding, we use BL to target the weight regularization of DNNs. Extensive experiments are performed on image, tabular, and speech data, considering convolutional (CNN) and Transformer-based neural network architectures with varying numbers of parameters. We show that the performance of DNNs is improved by minimizing the distance between the significant digit distributions of the DNN weights and the BL distribution along with L2 regularization. The improvements depend on the network architecture and how it deals with limited data. However, the proposed penalty term improves consistently and some CNN-based architectures gain up to 15% test accuracy over the default training scheme with L2 regularization on subsets of CIFAR 100.
UR - http://www.scopus.com/inward/record.url?scp=85219382001&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85219382001
SN - 2835-8856
VL - 2025
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -