Fully automated non-targeted GC-MS data analysis

Abstract

Non-targeted analysis is applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. In contrast to targeted analysis, non-targeted approaches take information of known and unknown compounds into account, are inherently more comprehensive and give a more holistic representation of the sample composition. 

Besides chromatographic techniques coupled to high resolution mass spectrometry such as LC-HRMS, gas chromatography with unit resolution mass spectrometry is still regularly utilized for non-targeted profiling or fingerprinting. This is mainly due to high separation power of GC and a wide availability and low costs of quadrupole mass spectrometers. 

Although several non-targeted approaches have been developed, data processing still remains a serious bottleneck. Baseline correction, feature detection, and retention time alignment can be prone to errors and time-consuming manual corrections are often necessary. We therefore developed an automated strategy to non-targeted GC-MS data avoiding feature detection and retention time alignment. The novel automated approach includes segmentation of chromatograms along the retention time axis, multiway decomposition of transformed segments followed by a supervised machine learning pipeline based on gradient boosted tree classification on the decomposed tensor [1, 2]. 

In order to make this novel data analysis strategy available to scientists without programming background, we developed a convenient browser based application. For the here presented interactive browser application the open source Python packages Bokeh and HoloViews were used. The application will be online freely available soon. 

[1] J. Vestner, G. de Revel, S. Krieger-Weber, D. Rauhut, M. du Toit, A. de Villiers, Toward automated chromatographic fingerprinting: A non-alignment approach to gas chromatography mass spectrometry data. Acta Chimica Acta 911 (2016) 42-58 
[2] K. Sirén, U. Fischer, J. Vestner, Automated supervised learning pipeline for non-targeted GC-MS data analysis. Analytica Chimica Acta: X 1 (2019) 100005

DOI:

Publication date: June 19, 2020

Issue: OENO IVAS 2019

Type: Article

Authors

Jochen Vestner, Kimmo Sirén, Pierre Le Brun, Ulrich Fischer

Institute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435 Neustadt, Germany
Institut National Supérieur des Sciences Agronomiques de l’Alimentation et de l’ Environnement, Agrosup Dijon, 6 boulevard Docteur Petitjean, 21000 Dijon, France
Department of Chemistry, University of Kaiserslautern, Erwin-Schroedinger-Strasse 52, D-67663 Kaiserslautern

Contact the author

Keywords

metabolomics, non-targeted, GC-MS, exploratory data analysis 

Tags

IVES Conference Series | OENO IVAS 2019

Citation

Related articles…

Climate and the evolving mix of grape varieties in Australia’s wine regions

The purpose of this study is to examine the changing mix of winegrape varieties in Australia so as to address the question: In the light of key climate indicators and predictions of further climate change, how appropriate are the grape varieties currently planted in Australia’s wine regions? To achieve this, regions are classified into zones according to each region’s climate variables, particularly average growing season temperature (GST), leaving aside within-region variations in climates. Five different climatic classifications are reported. Using projections of GSTs for the mid- and late 21st century, the extent to which each region is projected to move from its current zone classification to a warmer one is reported. Also shown is the changing proportion of each of 21 key varieties grown in a GST zone considered to be optimal for premium winegrape production. Together these indicators strengthen earlier suggestions that the mix of varieties may be currently less than ideal in many Australian wine regions, and would become even less so in coming decades if that mix was not altered in the anticipation of climate change. That is, grape varieties in many (especially the warmest) regions will have to keep changing, or wineries will have to seek fruit from higher latitudes or elevations if they wish to retain their current mix of varieties and wine styles.

Comparison of imputation methods in long and varied phenological series. Application to the Conegliano dataset, including observations from 1964 over 400 grape varieties

A large varietal collection including over 1700 varieties was maintained in Conegliano, ITA, since the 1950s. Phenological data on a subset of 400 grape varieties including wine grapes, table grapes, and raisins were acquired at bud break, flowering, veraison, and ripening since 1964. Despite the efforts in maintaining and acquiring data over such an extensive collection, the data set has varying degrees of missing cases depending on the variety and the year. This is ubiquitous in phenology datasets with significant size and length. In this work, we evaluated four state-of-the-art methods to estimate missing values in this phenological series: k-Nearest Neighbour (kNN), Multivariate Imputation by Chained Equations (mice), MissForest, and Bidirectional Recurrent Imputation for Time Series (BRITS). For each phenological stage, we evaluated the performance of the methods in two ways. 1) On the full dataset, we randomly hold-out 10% of the true values for use as a test set and repeated the process 1000 times (Monte Carlo cross-validation). 2) On a reduced and almost complete subset of varieties, we varied the percentage of missing values from 10% to 70% by random deletion. In all cases, we evaluated the performance on the original values using normalized root mean squared error. For the full dataset we also obtained performance statistics by variety and by year. MissForest provided average errors of 17% (3 days) at budbreak, 14% (4 days) at flowering, 14.5% (7 days) at veraison, and 17% (3 days) at maturity. We completed the imputations of the Conegliano dataset, one of the world’s most extensive and varied phenological time series and a steppingstone for future climate change studies in grapes. The dataset is now ready for further analysis, and a rigorous evaluation of imputation errors is included.

A blueprint for managing vine physiological balance at different spatial and temporal scales in Champagne

In Champagne, the vine adaptation to different climatic and technical changes during these last 20 years can be seen through physiological balance disruptions. These disruptions emphasize the general grapevine decline. Since the 2000s, among other nitrogen stress indicators, the must nitrogen has been decreasing. The combination of restricted mineral fertilizers and herbicide use, the growing variability of spring rainfall, the increasing thermal stress as well as the soil type heterogeneity are only a few underlying factors that trigger loss of physiological balance in the vineyards. It is important to weigh and quantify the impact of these factors on the vine. In order to do so, the Comité Champagne uses two key-tools: networking and modelization. The use of quantitative and harmonized ecophysiological indicators is necessary, especially in large spatial scales such as the Champagne appellation. A working group with different professional structures of Champagne has been launched by the Comité Champagne in order to create a common ecophysiology protocol and thus monitor the vine physiology, yearly, around 100 plots, with various cultural practices and types of soil. The use of crop modelling to follow the vine physiological balance within different pedoclimatic conditions enables to understand the present balance but also predict the possible disruptions to come in future climatic scenarios. The physiological references created each year through the working group, benefit the calibration of the STICS model used in Champagne. In return, the model delivers ecophysiology indicators, on a daily scale and can be used on very different types of soils. This study will present the bottom-up method used to give accurate information on the impacts of soil, climate and cultural practices on vine physiology.

Bioclimatic shifts and land use options for Viticulture in Portugal

Land use, plays a relevant role in the climatic system. It endows means for agriculture practices thus contributing to the food supply. Since climate and land are closely intertwined through multiple interface processes, climate change may lead to significant impacts in land use. In this study, 1-km observational gridded datasets are used to assess changes in the Köppen–Geiger and Worldwide Bioclimatic (WBCS)

Soil, vine, climate change – what is observed – what is expected

To evaluate the current and future impact of climate change on Viticulture requires an integrated view on a complex interacting system within the soil-plant-atmospheric continuum under continuous change. Aside of the globally observed increase in temperature in basically all viticulture regions for at least four decades, we observe several clear trends at the regional level in the ratio of precipitation to potential evapotranspiration. Additionally the recently published 6th assessment report of the IPCC (The physical science basis) shows case-dependent further expected shifts in climate patterns which will have substantial impacts on the way we will conduct viticulture in the decades to come.
Looking beyond climate developments, we observe rising temperatures in the upper soil layers which will have an impact on the distribution of microbial populations, the decay rate of organic matter or the storage capacity for carbon, thus affecting the emission of greenhouse gases (GHGs) and the viscosity of water in the soil-plant pathway, altering the transport of water. If the upper soil layers dry out faster due to less rainfall and/or increased evapotranspiration driven by higher temperatures, the spectral reflection properties of bare soil change and the transport of latent heat into the fruiting zone is increased putting a higher temperature load on the fruit. Interactions between micro-organisms in the rhizosphere and the grapevine root system are poorly understood but respond to environmental factors (such as increased soil temperatures) and the plant material (rootstock for instance), respectively the cultivation system (for example bio-organic versus conventional). This adds to an extremely complex system to manage in terms of increased resilience, adaptation to and even mitigation of climate change. Nevertheless, taken as a whole, effects on the individual expressions of wines with a given origin, seem highly likely to become more apparent.