REVEALING THE ORIGIN OF BORDEAUX WINES WITH RAW 1D-CHROMATOGRAMS
Understanding the composition of wine and how it is influenced by climate or wine-making practices is a challenging issue. Two approaches are typically used to explore this issue. The first approach uses che-mical fingerprints, which require advanced tools such as high-resolution mass spectrometry and mul-tidimensional chromatography. The second approach is the targeted method, which relies on the widely available 1-D GC/MS, but involves integrating the areas under a few peaks which ends up using only a small fraction of the chromatogram.
Here, we employ state-of-the-art machine learning methods to optimize the analysis of 1-D GC/MS chromatograms. Specifically, we aim to determine whether these chromatograms contain valuable in-formation beyond the manually extracted peaks typically utilized in the targeted approach.
To explore those questions, we analyzed 4 different types of 1-D raw chromatograms (3 SIM and 1 full-scan) of 80 wines (12 vintages from 7 estates of the Bordeaux area. We first applied nonlinear dimensio-nality reduction techniques (T-SNE and UMAP) to the chromatograms to obtain 2D maps. In the resul-ting maps, wines of the same estates across multiple vintages tended to form clear clusters, whose spatial distribution reflected the geography of the Bordeaux wine region. This indicated that, for this particular set of wine, the raw chromatograms are highly informative about terroir and wine identity.
Next, we applied cross-validated classifiers to the raw chromatograms and found that we could recover perfectly well estates identity independent of vintage. By contrast, performance on vintage classifica-tion was much lower with a maximum performance of 50% correct.
Crucially, we found that the entire chromatogram is informative with respect to both of these variables. Thus, the extraction of specific peaks of the chromatogram to quantify the concentration of 32 known chemical compounds–discarding the rest of the chromatograms–led to worse classification perfor-mance, suggesting that estate identity is distributed over a large chemical spectrum, including many molecules that have yet to be identified.
In addition, the GC raw data can be used to predict the ratings of a professional wine critic (Robert Par-ker) above chance, thus suggesting that GC might also contain information about the organoleptic pro-perties of wine.
Overall, this study demonstrates the strong potential of raw chromatogram analysis for wine characte-rization and identification.
Issue: OENO Macrowine 2023
Contact the author*
Machine learning, Wine composition, Sensorial classiﬁcation, Terroir