Fully automated non-targeted GC-MS data analysis

Abstract

Non-targeted analysis is applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. In contrast to targeted analysis, non-targeted approaches take information of known and unknown compounds into account, are inherently more comprehensive and give a more holistic representation of the sample composition. 

Besides chromatographic techniques coupled to high resolution mass spectrometry such as LC-HRMS, gas chromatography with unit resolution mass spectrometry is still regularly utilized for non-targeted profiling or fingerprinting. This is mainly due to high separation power of GC and a wide availability and low costs of quadrupole mass spectrometers. 

Although several non-targeted approaches have been developed, data processing still remains a serious bottleneck. Baseline correction, feature detection, and retention time alignment can be prone to errors and time-consuming manual corrections are often necessary. We therefore developed an automated strategy to non-targeted GC-MS data avoiding feature detection and retention time alignment. The novel automated approach includes segmentation of chromatograms along the retention time axis, multiway decomposition of transformed segments followed by a supervised machine learning pipeline based on gradient boosted tree classification on the decomposed tensor [1, 2]. 

In order to make this novel data analysis strategy available to scientists without programming background, we developed a convenient browser based application. For the here presented interactive browser application the open source Python packages Bokeh and HoloViews were used. The application will be online freely available soon. 

[1] J. Vestner, G. de Revel, S. Krieger-Weber, D. Rauhut, M. du Toit, A. de Villiers, Toward automated chromatographic fingerprinting: A non-alignment approach to gas chromatography mass spectrometry data. Acta Chimica Acta 911 (2016) 42-58 
[2] K. Sirén, U. Fischer, J. Vestner, Automated supervised learning pipeline for non-targeted GC-MS data analysis. Analytica Chimica Acta: X 1 (2019) 100005

DOI:

Publication date: June 19, 2020

Issue: OENO IVAS 2019

Type: Article

Authors

Jochen Vestner, Kimmo Sirén, Pierre Le Brun, Ulrich Fischer

Institute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435 Neustadt, Germany
Institut National Supérieur des Sciences Agronomiques de l’Alimentation et de l’ Environnement, Agrosup Dijon, 6 boulevard Docteur Petitjean, 21000 Dijon, France
Department of Chemistry, University of Kaiserslautern, Erwin-Schroedinger-Strasse 52, D-67663 Kaiserslautern

Contact the author

Keywords

metabolomics, non-targeted, GC-MS, exploratory data analysis 

Tags

IVES Conference Series | OENO IVAS 2019

Citation

Related articles…

Climate change projections to support the transition to climate-smart viticulture

The Earth’s system is undergoing major changes through a wide range of spatial and temporal scales as a response to growing anthropogenic radiative forcing, which is pushing the whole system far beyond its natural variability. Sources of greenhouse gases largely exceed their sinks, thus leading to a strengthened greenhouse effect. More energy is thereby being supplied to the system, with inevitable shifts in climatic patterns and weather regimes. Over the last decades, these modifications have been manifested in the full statistical distributions of the atmospheric variables, with dramatic changes in the frequency and intensity of extremes. Natural hazards, such as severe droughts, floods, forest fires, or heatwaves, are being triggered by extreme atmospheric events worldwide, thus threatening human activities. Viticultculture is not only exposed to changing climates but is also highly vulnerable, as grapevine phenology and physiological development are strongly controlled by atmospheric conditions. Therefore, the assessment of climate change projections for a given region is critical for climate change adaptation and risk reduction in viticulture. By adopting timely and suitable measures, the future sustainability and resiliency of the sector can be fostered. Climate-grapevine chain modelling is an essential tool for better planning and management. However, the accuracy of the resulting projections is limited by many uncertainties that must be duly taken into account when transferring knowledge to stakeholders and decision-makers. Climate-smart viticulture will comprise ensembles of locally tuned strategies, envisioning both adaptation and mitigation, assisted by emerging technologies and decision-support systems.

Comparison of imputation methods in long and varied phenological series. Application to the Conegliano dataset, including observations from 1964 over 400 grape varieties

A large varietal collection including over 1700 varieties was maintained in Conegliano, ITA, since the 1950s. Phenological data on a subset of 400 grape varieties including wine grapes, table grapes, and raisins were acquired at bud break, flowering, veraison, and ripening since 1964. Despite the efforts in maintaining and acquiring data over such an extensive collection, the data set has varying degrees of missing cases depending on the variety and the year. This is ubiquitous in phenology datasets with significant size and length. In this work, we evaluated four state-of-the-art methods to estimate missing values in this phenological series: k-Nearest Neighbour (kNN), Multivariate Imputation by Chained Equations (mice), MissForest, and Bidirectional Recurrent Imputation for Time Series (BRITS). For each phenological stage, we evaluated the performance of the methods in two ways. 1) On the full dataset, we randomly hold-out 10% of the true values for use as a test set and repeated the process 1000 times (Monte Carlo cross-validation). 2) On a reduced and almost complete subset of varieties, we varied the percentage of missing values from 10% to 70% by random deletion. In all cases, we evaluated the performance on the original values using normalized root mean squared error. For the full dataset we also obtained performance statistics by variety and by year. MissForest provided average errors of 17% (3 days) at budbreak, 14% (4 days) at flowering, 14.5% (7 days) at veraison, and 17% (3 days) at maturity. We completed the imputations of the Conegliano dataset, one of the world’s most extensive and varied phenological time series and a steppingstone for future climate change studies in grapes. The dataset is now ready for further analysis, and a rigorous evaluation of imputation errors is included.

Elucidating vineyard site contributions to key sensory molecules: Identification of correlations between elemental composition and volatile aroma profile of site-specific Pinot noir wines

The reproducibility of elemental profile in wines produced across multiple vintages has been previously reported using grapes from a single scion clone of Vitis vinifera L. cv. Pinot noir. The grapevines were grown on fourteen different vineyard sites, from Oregon to southern California in the U.S.A., which span distances from approximately hundreds of meters to 1450 km, while elevations range from near sea level to nearly 500 m. In addition, sensorial (i.e. aroma, taste, and mouthfeel) and chemical (i.e. polyphenolic and volatile) differences across the different vineyard sites have also been observed among these wines at two aging time points. While strong evidence exists to support that grapes grown in different regions can produce wines with unique chemical and sensorial profiles, even when a single clone is used, the understanding of growing site characteristics that result in this reproducible differentiation continues to emerge. One hypothesis is that the elemental profile that a vineyard site imparts to the grape berries and the resulting wine is an important contributor to this differentiation in chemistry and sensory of wines. For example, various classes of enzymes that catalyze the formation of key aroma compounds or their precursors require specific metals. In this work, we begin to report correlations between elemental and volatile aroma profiles of site-specific Pinot noir wines, made under standardized winemaking conditions, that have been previously shown to be distinguished separately by these chemical analyses.

Drought effect on aromatic and phenolic potential of seven recovered grapevine varieties in Castilla-La Mancha region (Spain)

The effects of climate change are seriously affecting the quality of wine grapes. High temperatures and drought cause imbalances in the chemical composition of grapes. The result is overripe grapes with low acidity and high sugar content, which produce wines with excessive alcohol content, lacking in freshness and not very aromatic. As a consequence, the search of varieties with capacity of produce quality grapes in adverse climate conditions is a good alternative to preserve the sustainability of vineyards. In this work, quality parameters of seven Vitis vinifera L. cultivars (five whites and two reds) recently recovered from extinction and grown under two different hydric regimes (rainfed and irrigated) were analyzed during the 2020 vintage. At harvest time, weight of 100 berries, must physicochemical parameters (brix degree, total acidity, malic acid, pH), and carbon and oxygen isotope ratios (δ13C, δ18O) were determined. Subsequently, varietal aroma potential index (IPAv) and total polyphenol index (TPI) were analyzed. Quality parameters, IPAv and TPI, showed significant differences between varieties and water regimes. Both red varieties, Moribel and Tinto Fragoso, stood out for their high aromatic and phenolic potential, which was higher under rainfed regime. Regarding to white varieties, Montonera del Casar and Jarrosuelto stood out in terms of varietal aroma potential. Montonera del Casar high acidity in its musts and Jarrosuelto showed the highest berry weights.

How distinctive are single vineyard Gewürztraminer musts and wines from Alto Adige (Italy) based on untargeted analysis, sensory profiling, and chemometric elaboration?

Vitis vinifera L. ‘Gewürztraminer’ is a historical grape variety of Alto Adige (Südtirol), Italy, which is widely grown in the area of Tramin an der Weinstraße, but is also grown globally. It produces highly aromatic wines that are strongly influenced by the terroir of the vineyard sites where they are grown. This study looked at musts and young wines from ‘Gewürztraminer’ grapes harvested in seven distinct vineyards near Tramin and then processed at Cantina di Termeno, minimizing winemaking protocol variability. Samples were profiled using bidimensional gas chromatography–time-of-flight mass spectrometry, liquid chromatography coupled to electrochemical detection, and near-IR spectrometry. The data were subjected to Principle Component Analysis and Hierarchical Clustering Analysis. Sensory discriminant testing was undertaken using the sorting method with a semi-trained panel, and the data were processed using Multidimensional Scaling. Seven must/wine pairs could be distinguished based on their untargeted volatilome profiles and on sensory evaluation. As expected, there were greater differences in the volatile compounds between the wines than between the musts. The wines from vineyards 4 and 5 were nonetheless quite homogenous in terms of chemical and sensory analyses, as were the wines from vineyards 1 and 3. For the phenolic profile, differences were noted between the musts and wines of vineyards 2, 3, and 4, but the musts from vineyards 5 and 7 were similar. Sensory analysis showed the wines from vineyards 6 and 7 to be distinct from the rest. These results reinforce that the composition of ‘Gewürztraminer’ musts and wines is strongly determined by vineyard site, even in a small geographic area with high variability of the terroir (soil and microclimate), and that these differences are apparent in the flavours and aromas of the finished wines. Further confirmation would require a larger sample of wines, preferably from several vintages.