Fully automated non-targeted GC-MS data analysis

Abstract

Non-targeted analysis is applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. In contrast to targeted analysis, non-targeted approaches take information of known and unknown compounds into account, are inherently more comprehensive and give a more holistic representation of the sample composition. 

Besides chromatographic techniques coupled to high resolution mass spectrometry such as LC-HRMS, gas chromatography with unit resolution mass spectrometry is still regularly utilized for non-targeted profiling or fingerprinting. This is mainly due to high separation power of GC and a wide availability and low costs of quadrupole mass spectrometers. 

Although several non-targeted approaches have been developed, data processing still remains a serious bottleneck. Baseline correction, feature detection, and retention time alignment can be prone to errors and time-consuming manual corrections are often necessary. We therefore developed an automated strategy to non-targeted GC-MS data avoiding feature detection and retention time alignment. The novel automated approach includes segmentation of chromatograms along the retention time axis, multiway decomposition of transformed segments followed by a supervised machine learning pipeline based on gradient boosted tree classification on the decomposed tensor [1, 2]. 

In order to make this novel data analysis strategy available to scientists without programming background, we developed a convenient browser based application. For the here presented interactive browser application the open source Python packages Bokeh and HoloViews were used. The application will be online freely available soon. 

[1] J. Vestner, G. de Revel, S. Krieger-Weber, D. Rauhut, M. du Toit, A. de Villiers, Toward automated chromatographic fingerprinting: A non-alignment approach to gas chromatography mass spectrometry data. Acta Chimica Acta 911 (2016) 42-58 
[2] K. Sirén, U. Fischer, J. Vestner, Automated supervised learning pipeline for non-targeted GC-MS data analysis. Analytica Chimica Acta: X 1 (2019) 100005

DOI:

Publication date: June 19, 2020

Issue: OENO IVAS 2019

Type: Article

Authors

Jochen Vestner, Kimmo Sirén, Pierre Le Brun, Ulrich Fischer

Institute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435 Neustadt, Germany
Institut National Supérieur des Sciences Agronomiques de l’Alimentation et de l’ Environnement, Agrosup Dijon, 6 boulevard Docteur Petitjean, 21000 Dijon, France
Department of Chemistry, University of Kaiserslautern, Erwin-Schroedinger-Strasse 52, D-67663 Kaiserslautern

Contact the author

Keywords

metabolomics, non-targeted, GC-MS, exploratory data analysis 

Tags

IVES Conference Series | OENO IVAS 2019

Citation

Related articles…

Water deficit differentially impacts the performances and the accumulation of grape metabolites of new varieties tolerant to fungi

The use of resistant varieties is a long-term but promising solution to reduce chemical input in viticulture. Several important breeding programs in Europe and abroad are now releasing a range of new hybrids performing well regarding fungi susceptibility and producing good quality wines. Unfortunately, insufficient attention is paid by the breeders to the adaptation of these varieties to climatic changes, notably to the increased climatic demand and water deficit (WD). Thus, prior to the adoption of such varieties by the wine industry in Mediterranean regions, there is a need to consider their suitability to WD. This study aimed to characterize the different drought-strategies adopted by 6 new resistant varieties selected by INRAE in comparison to Syrah. To allow the assessment of long-term impacts of WD, field-grown vines were exposed to contrasted WD from 2018 to 2021 under a semi-arid Mediterranean climate. A gradient of WD was applied in the field and controlled through plant measurements at the single plant level. Grape development was non-destructively monitored to determine the arrest of berry phloem unloading. The impacts of WD on berry composition, including water, primary metabolites (sugars, organic acids), secondary metabolites (anthocyanins, thiols precursors) and main cations contents, were assessed at this specific stage. Results showed different varietal responses during the year and inter-annual acclimation in terms of plant water use efficiency, biomass accumulation, as well as yield components and berry composition. WD differentially reduced the accumulation of primary metabolites at plant and berry levels, but it little changed their concentrations in the fruits at the ripe stage. Moreover, WD differentially impacted the accumulation of secondary metabolites and major cations between the varieties. In the talk, we’ll present the main results regarding the WD impacts on fruit metabolites and enlarge the reflection about the practical assessment of the grapevine acclimation to WD.

Investigating the impact of grape exposure and UV radiations on rotundone in Vitis vinifera L. Tardif grapes under field trial conditions

Rotundone is the main aroma compound responsible for peppery notes in wines whose biosynthesis is negatively affected by heat and drought. Through the alteration of precipitation regime and the increase in temperature during maturation, climate change is expected to affect wine peppery typicality. In this context there is a demand for developing sustainable viticultural strategies to enhance rotundone accumulation or limit its degradation. It was recently proposed that ultraviolet (UV) radiations could stimulate rotundone production. The aim of this study was to investigate under field trial conditions the impact of grape exposure and UV treatments on rotundone in Vitis vinifera L. Tardif, an almost extinct grape variety from south-west France that can express particularly high rotundone levels. Four different treatments were compared in 2021 to a control treatment using a randomised complete block design with three replications per treatment. Grape exposure was manipulated through early or late defoliation. Leaf and laterals shoots were removed at Eichorn Lorenz growth stages 32 or 34 on the morning-sun side of the canopy. During grape maturation, UV radiations were either reduced by 99% by installing UV radiation-shielding sheets, or applied four times using the Boxilumix™ non thermal device (Asclepios Tech, Tournefeuille) with the aim of activating plant signalling pathway. Loggers displayed in solar radiation shields were used to assess the effect of such shielding sheets on air temperature within the bunch zone. The composition of grapes subjected to these treatments will be soon analysed for their rotundone content and basic classical laboratory analyses. Grapes will be harvested to elaborate wines under standardized small-scale vinification conditions (60kg) that will be assessed by a trained sensory panel.

The combined effects of climate, soils, and deficit irrigation on yield and quality of Touriga Nacional under high atmospheric demand in the Douro Region

Global warming is one of the biggest environmental, social and economic threats in several viticultural regions. In the Douro Valley, changes are expected in the coming years, namely an increase in temperature and a decrease in precipitation. These changes are likely to have consequences for the production and quality of wine.
The aim of this study was to explore the effects of different soil characteristics combined with several deficit irrigation strategies, managed throughout ETc references and predawn leaf water potentials thresholds, on physiology, yield, and qualitative attributes on the Touriga Nacional variety under years of mild to severe water and heat stress.
The studies were conducted over seven years (2015 to 2021) in two plots of a commercial vineyard located at Quinta do Ataíde (Symington Family Estates) planted in 2011 and 2014 at 170 meters elevation, growing under three water regimes: non-irrigated (NI) and two deficit irrigation strategies (30% and 60% ETc) assessed weekly by Ψpd. The site has an annual rainfall below 500 mm, with high atmospheric demand. Climate data was collected from a weather station, located on site. Berry ripening was followed weekly for fruit analysis. At harvest, yield, vigour and pruning weight per vine were determined from 90 vines by treatment. Each season at veraison the NDVI Index was accessed by a drone. The soils physic-chemistry in the experimental blocs were analysed and grouped by SWHC. Delta C-13 analyses were also performed per treatment in two years.Irrigation had a positive effect on yield per vine, mostly due to an increase in berry and cluster weight, and fertility index through the years. A significant increase in sugar content, colour and phenols was observed with deficit irrigation in some years, but vine vigour related to soil characteristics had by far the greatest impact on quality.

Comparison of imputation methods in long and varied phenological series. Application to the Conegliano dataset, including observations from 1964 over 400 grape varieties

A large varietal collection including over 1700 varieties was maintained in Conegliano, ITA, since the 1950s. Phenological data on a subset of 400 grape varieties including wine grapes, table grapes, and raisins were acquired at bud break, flowering, veraison, and ripening since 1964. Despite the efforts in maintaining and acquiring data over such an extensive collection, the data set has varying degrees of missing cases depending on the variety and the year. This is ubiquitous in phenology datasets with significant size and length. In this work, we evaluated four state-of-the-art methods to estimate missing values in this phenological series: k-Nearest Neighbour (kNN), Multivariate Imputation by Chained Equations (mice), MissForest, and Bidirectional Recurrent Imputation for Time Series (BRITS). For each phenological stage, we evaluated the performance of the methods in two ways. 1) On the full dataset, we randomly hold-out 10% of the true values for use as a test set and repeated the process 1000 times (Monte Carlo cross-validation). 2) On a reduced and almost complete subset of varieties, we varied the percentage of missing values from 10% to 70% by random deletion. In all cases, we evaluated the performance on the original values using normalized root mean squared error. For the full dataset we also obtained performance statistics by variety and by year. MissForest provided average errors of 17% (3 days) at budbreak, 14% (4 days) at flowering, 14.5% (7 days) at veraison, and 17% (3 days) at maturity. We completed the imputations of the Conegliano dataset, one of the world’s most extensive and varied phenological time series and a steppingstone for future climate change studies in grapes. The dataset is now ready for further analysis, and a rigorous evaluation of imputation errors is included.

Adaptation to soil and climate through the choice of plant material

Choosing the rootstock, the scion variety and the training system best suited to the local soil and climate are the key elements for an economically sustainable production of wine. The choice of the rootstock/scion variety best adapted to the characteristics of the soil is essential but, by changing climatic conditions, ongoing climate change disrupts the fine-tuned local equilibrium. Higher temperatures induce shifts in developmental stages, with on the one hand increasing fears of spring frost damages and, on the other hand, ripening during the warmest periods in summer. Expected higher water demand and longer and more frequent drought events are also major concerns. The genetic control of the phenotypes, by genomic information but also by the epigenetic control of gene expression, offers a lot of opportunities for adapting the plant material to the future. For complex traits, genomic selection is also a promising method for predicting phenotypes. However, ecophysiological modelling is necessary to better anticipate the phenotypes in unexplored climatic conditions Genetic approaches applied on parameters of ecophysiological models rather than raw observed data are more than ever the basis for finding, or building, the ideal varieties of the future.