TY - JOUR
T1 - Improving potato leaf chlorophyll content prediction using a machine learning model with a hybrid dataset
AU - Yang, Haibo
AU - Hu, Yuncai
AU - Yin, Hang
AU - Jin, Qingyu
AU - Li, Fei
AU - Yu, Kang
N1 - Publisher Copyright:
© 2025 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2025
Y1 - 2025
N2 - Combining proximal remote sensing and machine learning (ML) has become a common approach to monitoring leaf chlorophyll content (LCC) for crop stress, productivity assessment, and nutrient management. However, the robustness of ML models is constrained by the limited numbers of in-situ training samples due to time-consuming and labour-intensive workflow in sample analysis. To cope with the issue of limited in-situ samples in monitoring potato LCC, this study used hybrid datasets that integrated limited in-situ measured samples and different-size PROSAIL model simulated samples to calibrate the ML models. Subsequently, the calibrated ML models were evaluated using independently field-measured data. During LCC sampling, canopy reflectance data (400–950 nm) were collected using a passive bi-directional spectrometer and an unmanned aerial vehicle carrying a hyperspectral sensor. Five types of ML models, including the partial least squares regression (PLSR), Gaussian process regression (GPR), random forest (RF), gradient boosting machines (GBM), and blending, were trained for LCC prediction. The scalability of the best ML models was evaluated using hyperspectral data extracted from unmanned aerial vehicle images. The results indicated that the ML models trained using the hybrid dataset outperformed those trained using the single limited in-situ measured dataset or the single PROSAIL simulated dataset when predicting the LCC of different potato cultivars. Nevertheless, when the number of measured in-situ samples was limited, the size of the simulated samples in the hybrid dataset influenced the prediction accuracy and robustness of the ML model. The RF model had the strongest generalization regardless of the handheld passive spectrometer data (R2 = 0.67, RPD = 1.55 and RMSE = 0.08 g m−2) and the aerial vehicle image data (R2 = 0.88, RPD = 1.97 and RMSE = 0.06 g m−2). Our results imply the potential of integrating limited in-situ samples with simulated data to achieve accurate and robust estimations for potato LCC. This study offers a key solution for crop chlorophyll monitoring in scenarios with restricted data availability.
AB - Combining proximal remote sensing and machine learning (ML) has become a common approach to monitoring leaf chlorophyll content (LCC) for crop stress, productivity assessment, and nutrient management. However, the robustness of ML models is constrained by the limited numbers of in-situ training samples due to time-consuming and labour-intensive workflow in sample analysis. To cope with the issue of limited in-situ samples in monitoring potato LCC, this study used hybrid datasets that integrated limited in-situ measured samples and different-size PROSAIL model simulated samples to calibrate the ML models. Subsequently, the calibrated ML models were evaluated using independently field-measured data. During LCC sampling, canopy reflectance data (400–950 nm) were collected using a passive bi-directional spectrometer and an unmanned aerial vehicle carrying a hyperspectral sensor. Five types of ML models, including the partial least squares regression (PLSR), Gaussian process regression (GPR), random forest (RF), gradient boosting machines (GBM), and blending, were trained for LCC prediction. The scalability of the best ML models was evaluated using hyperspectral data extracted from unmanned aerial vehicle images. The results indicated that the ML models trained using the hybrid dataset outperformed those trained using the single limited in-situ measured dataset or the single PROSAIL simulated dataset when predicting the LCC of different potato cultivars. Nevertheless, when the number of measured in-situ samples was limited, the size of the simulated samples in the hybrid dataset influenced the prediction accuracy and robustness of the ML model. The RF model had the strongest generalization regardless of the handheld passive spectrometer data (R2 = 0.67, RPD = 1.55 and RMSE = 0.08 g m−2) and the aerial vehicle image data (R2 = 0.88, RPD = 1.97 and RMSE = 0.06 g m−2). Our results imply the potential of integrating limited in-situ samples with simulated data to achieve accurate and robust estimations for potato LCC. This study offers a key solution for crop chlorophyll monitoring in scenarios with restricted data availability.
KW - chlorophyll content
KW - generalization ability
KW - machine learning
KW - potatoes
KW - remote monitoring
UR - http://www.scopus.com/inward/record.url?scp=85218703115&partnerID=8YFLogxK
U2 - 10.1080/01431161.2025.2465916
DO - 10.1080/01431161.2025.2465916
M3 - Article
AN - SCOPUS:85218703115
SN - 0143-1161
JO - International Journal of Remote Sensing
JF - International Journal of Remote Sensing
ER -