Improving potato leaf chlorophyll content prediction using a machine learning model with a hybrid dataset

Haibo Yang, Yuncai Hu, Hang Yin, Qingyu Jin, Fei Li, Kang Yu

Research output: Contribution to journalArticlepeer-review

Abstract

Combining proximal remote sensing and machine learning (ML) has become a common approach to monitoring leaf chlorophyll content (LCC) for crop stress, productivity assessment, and nutrient management. However, the robustness of ML models is constrained by the limited numbers of in-situ training samples due to time-consuming and labour-intensive workflow in sample analysis. To cope with the issue of limited in-situ samples in monitoring potato LCC, this study used hybrid datasets that integrated limited in-situ measured samples and different-size PROSAIL model simulated samples to calibrate the ML models. Subsequently, the calibrated ML models were evaluated using independently field-measured data. During LCC sampling, canopy reflectance data (400–950 nm) were collected using a passive bi-directional spectrometer and an unmanned aerial vehicle carrying a hyperspectral sensor. Five types of ML models, including the partial least squares regression (PLSR), Gaussian process regression (GPR), random forest (RF), gradient boosting machines (GBM), and blending, were trained for LCC prediction. The scalability of the best ML models was evaluated using hyperspectral data extracted from unmanned aerial vehicle images. The results indicated that the ML models trained using the hybrid dataset outperformed those trained using the single limited in-situ measured dataset or the single PROSAIL simulated dataset when predicting the LCC of different potato cultivars. Nevertheless, when the number of measured in-situ samples was limited, the size of the simulated samples in the hybrid dataset influenced the prediction accuracy and robustness of the ML model. The RF model had the strongest generalization regardless of the handheld passive spectrometer data (R2 = 0.67, RPD = 1.55 and RMSE = 0.08 g m−2) and the aerial vehicle image data (R2 = 0.88, RPD = 1.97 and RMSE = 0.06 g m−2). Our results imply the potential of integrating limited in-situ samples with simulated data to achieve accurate and robust estimations for potato LCC. This study offers a key solution for crop chlorophyll monitoring in scenarios with restricted data availability.

Original languageEnglish
JournalInternational Journal of Remote Sensing
DOIs
StateAccepted/In press - 2025

Keywords

  • chlorophyll content
  • generalization ability
  • machine learning
  • potatoes
  • remote monitoring

Fingerprint

Dive into the research topics of 'Improving potato leaf chlorophyll content prediction using a machine learning model with a hybrid dataset'. Together they form a unique fingerprint.

Cite this