TY - GEN
T1 - Estimation of Missing Values in Incomplete Industrial Process Data Sets Using ECM Algorithm
AU - Pirehgalin, Mina Fahimi
AU - Vogel-Heuser, Birgit
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/24
Y1 - 2018/9/24
N2 - Estimation of missing values is an essential step in data pre-processing to increase the data quality for further data mining approaches. The significance of estimation of missing values in industrial data sets is that different operational situations cannot be describe properly while data sets includes missing values. In this paper, Expectation Conditional Maximization is used to find an approximated model over the data based on Gaussian distribution. Then, in the Expectation step, Sweep operation is used to obtain the regression model of missing values on observable values and estimate the missing values based on observable data. In order to evaluate the results a process data set for a real industrial production system is considered. The missing values are simulated by randomly removing the data from variables. Finally, the accuracy of the proposed method in estimation of missing values is discussed as well as the effect of imputation of missing values on further data analysis.
AB - Estimation of missing values is an essential step in data pre-processing to increase the data quality for further data mining approaches. The significance of estimation of missing values in industrial data sets is that different operational situations cannot be describe properly while data sets includes missing values. In this paper, Expectation Conditional Maximization is used to find an approximated model over the data based on Gaussian distribution. Then, in the Expectation step, Sweep operation is used to obtain the regression model of missing values on observable values and estimate the missing values based on observable data. In order to evaluate the results a process data set for a real industrial production system is considered. The missing values are simulated by randomly removing the data from variables. Finally, the accuracy of the proposed method in estimation of missing values is discussed as well as the effect of imputation of missing values on further data analysis.
KW - Expectation Conditional Maximization
KW - Likelihood Inference
KW - Missing Data
KW - Multivariate Gaussian Distribution
KW - Sweep Matrix
UR - http://www.scopus.com/inward/record.url?scp=85055528699&partnerID=8YFLogxK
U2 - 10.1109/INDIN.2018.8471950
DO - 10.1109/INDIN.2018.8471950
M3 - Conference contribution
AN - SCOPUS:85055528699
T3 - Proceedings - IEEE 16th International Conference on Industrial Informatics, INDIN 2018
SP - 251
EP - 257
BT - Proceedings - IEEE 16th International Conference on Industrial Informatics, INDIN 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Conference on Industrial Informatics, INDIN 2018
Y2 - 18 July 2018 through 20 July 2018
ER -