Phenolic composition is essential to wine quality (Cleary et al., 2015; Bindon et al., 2020; Niimi et al., 2020) and its assessment is a strong industrial need to quality management. Objective of this work was to develop a rapid analysis method using the Absorbance-Transmission and fluorescence Excitation-Emission Matrix (A-TEEM) technique. Polyphenols exhibit characteristic and high fluorescence quantum yields, which makes them highly suitable for this technique. The method’s automatic real-time Inner Filter Effect (IFE) correction allows the quantification of minor compounds (Gilmore et al., 2016). IFE-corrected fluorescence EEM data and the absorbance data were combined, and the spectral data were regressed against the concentrations of 34 anthocyanins, flavan-3-ols, tannins, polymeric pigments, flavonols and hydroxycinnamic acids measured independently by HPLC-DAD and UV-vis. The study focused on comparing Partial Least Squares Regression (PLSR) and Extreme Gradient Boost Regression (XGBR) for the single- (fluorescence EEM or absorbance) and multi- (combined) block data. The calibration set comprised 1133 files acquired from 126 diverse experimental and commercial wines. Validation was carried out on two data sets, first by a 14% randomized sample split from the calibration data keeping instrument replicates together, and thereafter by another independent set of 96 files from 16 wines. As a general trend, validation of the multi-block data models with independent data using XGBR, compared to PLSR, yielded higher prediction correlation coefficients (R2P) and lower Root Mean Square Errors for Prediction (RMSEP). Considering all 34 compound fits, mean R2P of 0.947 with XGBR and of 0.899 with PLSR were obtained. The highest fits were obtained for compounds of the anthocyanin family with mean R2P of 0.974 (XGBR) and 0.954 (PLSR), respectively, while lower fits were found for flavan-3-oles with R2P of 0.878 (XGBR) and 0.771 (PLSR), indicating compound effects due to extraction and chromatographic and spectral analysis methods affecting repeatability and quantification limits. In general, precise model fits were found for compounds > 10 mg/L with R2P between 0.929 and 0.992 (XGBR) and between 0.875 and 0.992 (PLSR). Supplementary, all individual compounds could be identified according to their family by spectral fingerprints. However, these multi-block data sets were also associated with significantly higher R2P (and lower RMSEP) compared to a single block evaluation of the fluorescence EEM or absorbance data only. By using mean-centering and an Extended Mixture Model filter the multi-block data sets fit robustly using both XGBR and PLSR without the need to apply secondary variable selection algorithms. We conclude that analyzing the A-TEEM data using the multi-block organization and the XGBR algorithm facilitates a robust prediction of the key phenolic compound concentrations that strongly influence the Chilean wine quality.
Authors: Doreen Schober – Center for Research and Innovation, Viña Concha y Toro, Ruta k-650 km 10, Pencahue, Región de Maule, Chile,Adam Gilmore, HORIBA Instruments Inc. 20 Knightsbridge Rd., Piscataway, NJ 08854, USA Jorge Zincker, Center for Research and Innovation, Viña Concha y Toro, Ruta k-650 km 10, Pencahue, Región de Maule, Chile Alvaro Gonzalez, Center for Research and Innovation, Viña Concha y Toro, Ruta k-650 km 10, Pencahue, Región de Maule, Chile
Keywords: quality, polyphenols, spectroscopy, a-teem, wine, machine learning