Fully automated non-targeted GC-MS data analysis

Abstract

Non-targeted analysis is applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. In contrast to targeted analysis, non-targeted approaches take information of known and unknown compounds into account, are inherently more comprehensive and give a more holistic representation of the sample composition. 

Besides chromatographic techniques coupled to high resolution mass spectrometry such as LC-HRMS, gas chromatography with unit resolution mass spectrometry is still regularly utilized for non-targeted profiling or fingerprinting. This is mainly due to high separation power of GC and a wide availability and low costs of quadrupole mass spectrometers. 

Although several non-targeted approaches have been developed, data processing still remains a serious bottleneck. Baseline correction, feature detection, and retention time alignment can be prone to errors and time-consuming manual corrections are often necessary. We therefore developed an automated strategy to non-targeted GC-MS data avoiding feature detection and retention time alignment. The novel automated approach includes segmentation of chromatograms along the retention time axis, multiway decomposition of transformed segments followed by a supervised machine learning pipeline based on gradient boosted tree classification on the decomposed tensor [1, 2]. 

In order to make this novel data analysis strategy available to scientists without programming background, we developed a convenient browser based application. For the here presented interactive browser application the open source Python packages Bokeh and HoloViews were used. The application will be online freely available soon. 

[1] J. Vestner, G. de Revel, S. Krieger-Weber, D. Rauhut, M. du Toit, A. de Villiers, Toward automated chromatographic fingerprinting: A non-alignment approach to gas chromatography mass spectrometry data. Acta Chimica Acta 911 (2016) 42-58 
[2] K. Sirén, U. Fischer, J. Vestner, Automated supervised learning pipeline for non-targeted GC-MS data analysis. Analytica Chimica Acta: X 1 (2019) 100005

DOI:

Publication date: June 19, 2020

Issue: OENO IVAS 2019

Type: Article

Authors

Jochen Vestner, Kimmo Sirén, Pierre Le Brun, Ulrich Fischer

Institute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435 Neustadt, Germany
Institut National Supérieur des Sciences Agronomiques de l’Alimentation et de l’ Environnement, Agrosup Dijon, 6 boulevard Docteur Petitjean, 21000 Dijon, France
Department of Chemistry, University of Kaiserslautern, Erwin-Schroedinger-Strasse 52, D-67663 Kaiserslautern

Contact the author

Keywords

metabolomics, non-targeted, GC-MS, exploratory data analysis 

Tags

IVES Conference Series | OENO IVAS 2019

Citation

Related articles…

Adaptation to soil and climate through the choice of plant material

Choosing the rootstock, the scion variety and the training system best suited to the local soil and climate are the key elements for an economically sustainable production of wine. The choice of the rootstock/scion variety best adapted to the characteristics of the soil is essential but, by changing climatic conditions, ongoing climate change disrupts the fine-tuned local equilibrium. Higher temperatures induce shifts in developmental stages, with on the one hand increasing fears of spring frost damages and, on the other hand, ripening during the warmest periods in summer. Expected higher water demand and longer and more frequent drought events are also major concerns. The genetic control of the phenotypes, by genomic information but also by the epigenetic control of gene expression, offers a lot of opportunities for adapting the plant material to the future. For complex traits, genomic selection is also a promising method for predicting phenotypes. However, ecophysiological modelling is necessary to better anticipate the phenotypes in unexplored climatic conditions Genetic approaches applied on parameters of ecophysiological models rather than raw observed data are more than ever the basis for finding, or building, the ideal varieties of the future.

Comparison of imputation methods in long and varied phenological series. Application to the Conegliano dataset, including observations from 1964 over 400 grape varieties

A large varietal collection including over 1700 varieties was maintained in Conegliano, ITA, since the 1950s. Phenological data on a subset of 400 grape varieties including wine grapes, table grapes, and raisins were acquired at bud break, flowering, veraison, and ripening since 1964. Despite the efforts in maintaining and acquiring data over such an extensive collection, the data set has varying degrees of missing cases depending on the variety and the year. This is ubiquitous in phenology datasets with significant size and length. In this work, we evaluated four state-of-the-art methods to estimate missing values in this phenological series: k-Nearest Neighbour (kNN), Multivariate Imputation by Chained Equations (mice), MissForest, and Bidirectional Recurrent Imputation for Time Series (BRITS). For each phenological stage, we evaluated the performance of the methods in two ways. 1) On the full dataset, we randomly hold-out 10% of the true values for use as a test set and repeated the process 1000 times (Monte Carlo cross-validation). 2) On a reduced and almost complete subset of varieties, we varied the percentage of missing values from 10% to 70% by random deletion. In all cases, we evaluated the performance on the original values using normalized root mean squared error. For the full dataset we also obtained performance statistics by variety and by year. MissForest provided average errors of 17% (3 days) at budbreak, 14% (4 days) at flowering, 14.5% (7 days) at veraison, and 17% (3 days) at maturity. We completed the imputations of the Conegliano dataset, one of the world’s most extensive and varied phenological time series and a steppingstone for future climate change studies in grapes. The dataset is now ready for further analysis, and a rigorous evaluation of imputation errors is included.

Projected changes in vine phenology of two varieties with different thermal requirements cultivated in La Mancha DO (Spain) under climate change scenarios

The aim of this work was to analyze the phenology variability of Tempranillo and Chardonnay cultivars, related to the climatic characteristics in La Mancha Designation of Origin, and their potential changes under climate change scenarios. Phenological dates referred to budbreak, flowering, veraison and harvest were analyzed for the period 2000-2019. The weather conditions at daily time scale, recorded during the same period, were also evaluated. The thermal requirements to reach each of these phenological stages were calculated and expressed as the GDD accumulated from DOY=60. Changes in phenology were projected by 2050 and 2070 taking into account those values and the projected temperatures and precipitation, simulated under two Representative Concentration Pathway (RCP) scenarios –RCP4.5 and RCP8.5– using an ensemble of models. The average phenological dates during the period under study were, April 16th ± 6.6 days and April 5th ± 6.0 days for budbreak, May 31st ± 6.0 days and May 27th ± 5.3 days for flowering, July 26th ± 5.6 days and July 25th ± 5.8 days for veraison, and Ago 23rd ± 10.8 days and Ago 17th ± 9.0 days for harvest, respectively, for Tempranillo and Chardonnay. The projected changes in temperature imply an average change in the maximum growing season (April-August) temperatures of 1.2 and 1.9°C by 2050, and 1.6 and 2.6°C by 2070, under the RCP4.5 and RCP8.5 scenarios, respectively. A reduction in precipitation is predicted, which vary between 15% for 2050 under RCP4.5 scenario and up to 30% by 2070 under RCP8.5. The advance of the phenological dates for 2050, could be of 6, 7, 7, and 8 days for Tempranillo and 4, 6, 6 and 9 days for Chardonnay, respectively for budbreak, flowering, veraison and harvest under the RCP4.5 scenario. Under the RCP8.5 emission scenario, the advance could be up to 30% higher.

Elucidating vineyard site contributions to key sensory molecules: Identification of correlations between elemental composition and volatile aroma profile of site-specific Pinot noir wines

The reproducibility of elemental profile in wines produced across multiple vintages has been previously reported using grapes from a single scion clone of Vitis vinifera L. cv. Pinot noir. The grapevines were grown on fourteen different vineyard sites, from Oregon to southern California in the U.S.A., which span distances from approximately hundreds of meters to 1450 km, while elevations range from near sea level to nearly 500 m. In addition, sensorial (i.e. aroma, taste, and mouthfeel) and chemical (i.e. polyphenolic and volatile) differences across the different vineyard sites have also been observed among these wines at two aging time points. While strong evidence exists to support that grapes grown in different regions can produce wines with unique chemical and sensorial profiles, even when a single clone is used, the understanding of growing site characteristics that result in this reproducible differentiation continues to emerge. One hypothesis is that the elemental profile that a vineyard site imparts to the grape berries and the resulting wine is an important contributor to this differentiation in chemistry and sensory of wines. For example, various classes of enzymes that catalyze the formation of key aroma compounds or their precursors require specific metals. In this work, we begin to report correlations between elemental and volatile aroma profiles of site-specific Pinot noir wines, made under standardized winemaking conditions, that have been previously shown to be distinguished separately by these chemical analyses.

Climate change impacts: a multi-stress issue

With the aim of producing premium wines, it is admitted that moderate environmental stresses may contribute to the accumulation of compounds of interest in grapes. However the ongoing climate change, with the appearance of more limiting conditions of production is a major concern for the wine industry economic. Will it be possible to maintain the vineyards in place, to preserve the current grape varieties and how should we anticipate the adaptation measures to ensure the sustainability of vineyards? In this context, the question of the responses and adaptation of grapevine to abiotic stresses becomes a major scientific issue to tackle. An abiotic stress can be defined as the effect of a specific factor of the physico-chemical environment of the plants (temperature, availability of water and minerals, light, etc.) which reduces growth, and for a crop such as the vine, the yield, the composition of the fruits and the sustainability of the plants. Water stress is in many minds, but a systemic vision is essential for at least two reasons. The first reason is that in natural environments, a single factor is rarely limiting, and plants have to deal with a combination of constraints, as for example heat and drought, both in time and at a given time. The second reason is that plants, including grapevine, have central mechanisms of stress responses, as redox regulatory pathways, that play an important role in adaptation and survival. Here we will review the most recent studies dealing with this issue to provide a better understanding of the grapevine responses to a combination of environmental constraints and of the underlying regulatory pathways, which may be very helpful to design more adapted solutions to cope with climate change.