Fully automated non-targeted GC-MS data analysis

Abstract

Non-targeted analysis is applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. In contrast to targeted analysis, non-targeted approaches take information of known and unknown compounds into account, are inherently more comprehensive and give a more holistic representation of the sample composition. 

Besides chromatographic techniques coupled to high resolution mass spectrometry such as LC-HRMS, gas chromatography with unit resolution mass spectrometry is still regularly utilized for non-targeted profiling or fingerprinting. This is mainly due to high separation power of GC and a wide availability and low costs of quadrupole mass spectrometers. 

Although several non-targeted approaches have been developed, data processing still remains a serious bottleneck. Baseline correction, feature detection, and retention time alignment can be prone to errors and time-consuming manual corrections are often necessary. We therefore developed an automated strategy to non-targeted GC-MS data avoiding feature detection and retention time alignment. The novel automated approach includes segmentation of chromatograms along the retention time axis, multiway decomposition of transformed segments followed by a supervised machine learning pipeline based on gradient boosted tree classification on the decomposed tensor [1, 2]. 

In order to make this novel data analysis strategy available to scientists without programming background, we developed a convenient browser based application. For the here presented interactive browser application the open source Python packages Bokeh and HoloViews were used. The application will be online freely available soon. 

[1] J. Vestner, G. de Revel, S. Krieger-Weber, D. Rauhut, M. du Toit, A. de Villiers, Toward automated chromatographic fingerprinting: A non-alignment approach to gas chromatography mass spectrometry data. Acta Chimica Acta 911 (2016) 42-58 
[2] K. Sirén, U. Fischer, J. Vestner, Automated supervised learning pipeline for non-targeted GC-MS data analysis. Analytica Chimica Acta: X 1 (2019) 100005

DOI:

Publication date: June 19, 2020

Issue: OENO IVAS 2019

Type: Article

Authors

Jochen Vestner, Kimmo Sirén, Pierre Le Brun, Ulrich Fischer

Institute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435 Neustadt, Germany
Institut National Supérieur des Sciences Agronomiques de l’Alimentation et de l’ Environnement, Agrosup Dijon, 6 boulevard Docteur Petitjean, 21000 Dijon, France
Department of Chemistry, University of Kaiserslautern, Erwin-Schroedinger-Strasse 52, D-67663 Kaiserslautern

Contact the author

Keywords

metabolomics, non-targeted, GC-MS, exploratory data analysis 

Tags

IVES Conference Series | OENO IVAS 2019

Citation

Related articles…

Grapevine varietal diversity as mitigation tool for climate change: Agronomic and oenologic potential of 14 foreign varieties grown in Languedoc region (France)

Climate change effects in Languedoc include an expected rise in temperatures, increased evapotranspiration as well as more severe and frequent climatic hazards, such as frost, drought periods and heat waves. For winegrowers theses phenomena impact both yield and quality, resulting in more frequent unbalanced wines. Research on identified mitigation tools for vineyard management is necessary to improve resilience of grapevine agrosystems. Varietal assortment is one of them. This study focuses on agronomic and oenologic potential of 14 foreign varieties grown in Languedoc French region. Fourteen grapevine varieties were monitored during 2021 from June until harvest on eight different sites, some of which occurring on more than one site adding up to 21 different modalities: 7 white varieties Alvarinho B, Assyrtiko B (2), Malvasia Istriana B, Parellada B, Verdejo B, Verdelho B, Xarello B, and 7 black varieties Saperavi N (2), Touriga nacional N, Baga N, Aleatico N, Montepulciano N (2), Primitivo N (3), Calabrese N (3). Varietals were compared through the following parameters: phenology was assessed by using the information collected in the Database Network of French Vine Conservatories (INRAE-SupAgro-IFV, 2005-2015). The number of inflorescences for shoots from secondary buds and bourillons and suckers were observed to assess post-bud break frost tolerance potential. Grapevine water status was studied through stem water potential measurement, observation of foliage symptoms of drought, and 𝛿13C on must. Frequencies and intensities of downy mildew, powdery mildew, and black rot attacks were estimated before harvest on leaves and clusters and botrytis at harvest to assess disease susceptibilities. Berry composition was monitored from end of veraison until harvest. Yield and mean bunch weight were also calculated. Varieties were then ranked on a 1-4 scale for each parameter and compared through PCA. Forty two stations of the Mediterranean basin were compared by PCA with the Multicriteria Climatic Classification indicators in order to confront the collected information during 2021 campaign to the hypothesis that plants coming from dry and hot regions are genetically adapted to such climatic conditions.

Making sense of available information for climate change adaptation and building resilience into wine production systems across the world

Effects of climate change on viticulture systems and winemaking processes are being felt across the world. The IPCC 6thAssessment Report concluded widespread and rapid changes have occurred, the scale of recent changes being unprecedented over many centuries to many thousands of years. These changes will continue under all emission scenarios considered, including increases in frequency and intensity of hot extremes, heatwaves, heavy precipitation and droughts. Wine companies need tools and models allowing to peer into the future and identify the moment for intervention and measures for mitigation and/or avoidance. Previously, we presented conceptual guidelines for a 5-stage framework for defining adaptation strategies for wine businesses. That framework allows for direct comparison of different solutions to mitigate perceived climate change risks. Recent global climatic evolution and multiple reports of severe events since then (smoke taint, heatwave and droughts, frost, hail and floods, rising sea levels) imply urgency in providing effective tools to tackle the multiple perceived risks. A coordinated drive towards a higher level of resilience is therefore required. Recent publications such as the Australian Wine Future Climate Atlas and results from projects such as H2020 MED-GOLD inform on expected climate change impacts to the wine sector, foreseeing the climate to expect at regional and vineyard scale in coming decades. We present examples of practical application of the Climate Change Adaptation Framework (CCAF) to impacts affecting wine production in two wine regions: Barossa (Australia) and Douro (Portugal). We demonstrate feasibility of the framework for climate adaptation from available data and tools to estimate historical climate-induced profitability loss, to project it in the future and to identify critical moments when disruptions may occur if timely measures are not implemented. Finally, we discuss adaptation measures and respective timeframes for successful mitigation of disruptive risk while enhancing resilience of wine systems.

Effect of the commercial inoculum of arbuscular mycorrhiza in the establishment of a commercial vineyard of the cultivar “Manto negro

The favorable effect of symbiosis with arbuscular mycorrhizal fungi (AMF) has been known and studied since the 60s. Nowadays, many companies took the chance to start promoting and selling commercial inoculants of AMF, in order to be used as biofertilizers and encourage sustainable biological agriculture. However, the positive effect of these commercial biofertilizers on plant growth is not always demonstrated, especially under field conditions. In this study, we used a commercial inoculum on newly planted grapevines of a local cultivar grafted on a common rootstock R110. We followed the physiological status of vines, growth and productivity and functional biodiversity of soil bacteria during the first and second years of 20 inoculated with commercial inoculum bases on Rhizophagus irregularis and Funeliformis mosseaeAMF at field planting time and 20 non-inoculated control plants. All the parameters measured showed a neutral to negative effect on plant growth and production. The inoculated plants always presented lower values of photosynthesis, growth and grape production, although in some cases the differences did not reach statistical significance. On the contrary, the inoculation supposed an increase of the bacterial functional diversity, although the differences were not statistically significant either. Several studies show that the effect of inoculation with AMF is context-dependent. The non-favorable effects are probably due to inoculation ineffectiveness under complex field conditions and/or that, under certain conditions, AMF presence may be a parasitic association. This puts into question the effectiveness of its application in the field. Therefore, it is recommended to only resort to this type of biofertilizer when the cultivation conditions require it (e.g., very low previous microbial diversity, foreseeable stress due to drought, salinity, or lack of nutrients) and not as a general fertilization practice.

Comparison of imputation methods in long and varied phenological series. Application to the Conegliano dataset, including observations from 1964 over 400 grape varieties

A large varietal collection including over 1700 varieties was maintained in Conegliano, ITA, since the 1950s. Phenological data on a subset of 400 grape varieties including wine grapes, table grapes, and raisins were acquired at bud break, flowering, veraison, and ripening since 1964. Despite the efforts in maintaining and acquiring data over such an extensive collection, the data set has varying degrees of missing cases depending on the variety and the year. This is ubiquitous in phenology datasets with significant size and length. In this work, we evaluated four state-of-the-art methods to estimate missing values in this phenological series: k-Nearest Neighbour (kNN), Multivariate Imputation by Chained Equations (mice), MissForest, and Bidirectional Recurrent Imputation for Time Series (BRITS). For each phenological stage, we evaluated the performance of the methods in two ways. 1) On the full dataset, we randomly hold-out 10% of the true values for use as a test set and repeated the process 1000 times (Monte Carlo cross-validation). 2) On a reduced and almost complete subset of varieties, we varied the percentage of missing values from 10% to 70% by random deletion. In all cases, we evaluated the performance on the original values using normalized root mean squared error. For the full dataset we also obtained performance statistics by variety and by year. MissForest provided average errors of 17% (3 days) at budbreak, 14% (4 days) at flowering, 14.5% (7 days) at veraison, and 17% (3 days) at maturity. We completed the imputations of the Conegliano dataset, one of the world’s most extensive and varied phenological time series and a steppingstone for future climate change studies in grapes. The dataset is now ready for further analysis, and a rigorous evaluation of imputation errors is included.

Grape berry size is a key factor in determining New Zealand Pinot noir wine composition

Making high quality but affordable Pinot noir (PN) wine is challenging in most terroirs and New Zealand’s (NZ) situation is no exception. To increase the probability of making highly typical PN wines producers choose to grow grapes in cool climates on lower fertility soils while adopting labour intensive practices. Stringent yield targets and higher input costs necessarily mean that PN wine cost is high, and profitability lower, in line-priced varietal wine ranges. To understand the reasons why higher yielding vines are perceived to produce wines of lower quality we have undertaken an extensive study of PN in NZ. Since 2018, we established a network of twelve trial sites in three NZ regions to find individual vines that produced acceptable commercial yields (above 2.5kg per vine) and wines of composition comparable to “Icon” labels. Approximately 20% of 660 grape lots (N = 135) were selected from within a narrow juice Total Soluble Solids (TSS) range and made into single vine wines under controlled conditions. Principal Component Analysis of the vine, berry, juice and wine parameters from three vintages found grape berry mass to be most effective clustering variable. As berry mass category decreased there was a systematic increase in the probability of higher berry red colour and total phenolics with a parallel increase in wine phenolics, changed aroma fraction and decreased juice amino acids. The influence of berry size on wine composition would appear stronger than the individual effects of vintage, region, vineyard or vine yield. Our observations support the hypothesis that it is possible to produce PN wines that fall within an “Icon” benchmark composition range at yields above 2.5kg per vine provided that the Leaf Area:Fruit Weight ratio is above 12cm2 per g, mean berry mass is below 1.2g and juice TSS is above 22°Brix.