Data fusion approaches for sensory and multimodal chemistry data applied to storage conditions
AIM: The need to combine multimodal data for complex samples is due to the different information captured in each of the techniques (modes). The aim of the study was to provide a critical evaluation of two approaches to fusing multi-modal chemistry and sensory data, namely, multiblock multiple factor analysis (MFA) and concatenation using principal component analysis (PCA).
METHODS: Wines were submitted to sensory analysis using Pivot©Profile (Thuillier et al. 2015) and chemical analysis in four modes: antioxidant measurements (AM), volatile compounds composition (VCC), ultraviolet-visible light (UV-Vis) spectrophotometry (Mafata et al. 2019), and infra-red (IR) spectroscopy. Correspondence analysis (CA), principal component analysis (PCA), and multiple factor analysis (MFA) were used to model data under the data analysis steps involving data cleaning, visualizing, modelling and evaluation (Pagès 2004). Percentage explained variation (%EV) and regression vector (RV) coefficients were used as comparative evaluation parameters between data models (Abdi 2007).
RESULTS: IR spectral data were used as an example of the assessment of the need for data cleaning/pre-processing. Similarities in MFA and high RV coefficients indicated that the raw (unprocessed data) could be used for the data fusion. High RV coefficients and MFA proximity between the antioxidants and UV-Vis measurements indicated an overlap between the type of information contained in the two. The differences between the information captured in each of the five modes can be seen in the different measurements, from the knowledge of the theory/ ontext behind the technique, and statistically. Statistically, the differences are measured and visualised by a lack of overlap (redundancy) in the MFA and its accompanying cluster analysis.
The %EV when performing PCA are higher than with MFA, a consequence of fusing big data sets from various modes and not necessarily a direct result of the relationships among the data sets. Therefore, the %EV was ruled out as a reliable measure of the differences in informational value between MFA and PCA fusion strategies. RV coefficients, of which MFA were highest, were the best measurements of the performance of data fusion approaches. MFA demonstrated greater appropriateness as a statistical tool for fusing multi-modal data.
Issue: Macrowine 2021
South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University, South Africa,Mpho, MAFATA, South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University, South Africa Martin, KIDD, Centre for Statistical Consultation, Stellenbosch University, South Africa Andrei, MEDVEDOVICI, Faculty of Chemistry, University of Bucharest, Romania Astrid, BUICA, South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University, South Africa
Contact the author
data fusion; sensory evaluation; chemical composition; white wines; storage