Fully automated non-targeted GC-MS data analysis

Abstract

Non-targeted analysis is applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. In contrast to targeted analysis, non-targeted approaches take information of known and unknown compounds into account, are inherently more comprehensive and give a more holistic representation of the sample composition. 

Besides chromatographic techniques coupled to high resolution mass spectrometry such as LC-HRMS, gas chromatography with unit resolution mass spectrometry is still regularly utilized for non-targeted profiling or fingerprinting. This is mainly due to high separation power of GC and a wide availability and low costs of quadrupole mass spectrometers. 

Although several non-targeted approaches have been developed, data processing still remains a serious bottleneck. Baseline correction, feature detection, and retention time alignment can be prone to errors and time-consuming manual corrections are often necessary. We therefore developed an automated strategy to non-targeted GC-MS data avoiding feature detection and retention time alignment. The novel automated approach includes segmentation of chromatograms along the retention time axis, multiway decomposition of transformed segments followed by a supervised machine learning pipeline based on gradient boosted tree classification on the decomposed tensor [1, 2]. 

In order to make this novel data analysis strategy available to scientists without programming background, we developed a convenient browser based application. For the here presented interactive browser application the open source Python packages Bokeh and HoloViews were used. The application will be online freely available soon. 

[1] J. Vestner, G. de Revel, S. Krieger-Weber, D. Rauhut, M. du Toit, A. de Villiers, Toward automated chromatographic fingerprinting: A non-alignment approach to gas chromatography mass spectrometry data. Acta Chimica Acta 911 (2016) 42-58 
[2] K. Sirén, U. Fischer, J. Vestner, Automated supervised learning pipeline for non-targeted GC-MS data analysis. Analytica Chimica Acta: X 1 (2019) 100005

DOI:

Publication date: June 19, 2020

Issue: OENO IVAS 2019

Type: Article

Authors

Jochen Vestner, Kimmo Sirén, Pierre Le Brun, Ulrich Fischer

Institute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435 Neustadt, Germany
Institut National Supérieur des Sciences Agronomiques de l’Alimentation et de l’ Environnement, Agrosup Dijon, 6 boulevard Docteur Petitjean, 21000 Dijon, France
Department of Chemistry, University of Kaiserslautern, Erwin-Schroedinger-Strasse 52, D-67663 Kaiserslautern

Contact the author

Keywords

metabolomics, non-targeted, GC-MS, exploratory data analysis 

Tags

IVES Conference Series | OENO IVAS 2019

Citation

Related articles…

Sustainable fertilisation of the vineyard in Galicia (Spain)

Excessive fertilization of the vineyard leads to low quality grapes, increased costs and a negative impact on the environment. In order to establish an integrated management system aimed at a sustainable fertilization of the vineyards, nutritional reference levels were established. For this purpose, 30 representative vineyards of the Albariño variety were studied, in which soil and petiole analyses were carried out for two years and grape yield and quality at harvest were measured. In both years of study, soil pH, calcium, sodium and cation exchange capacity were positively correlated with calcium content and negatively correlated with manganese in grapes. Irrigated vineyards had higher levels of aluminium in soil and lower levels of calcium in petiole. Climatic conditions were very different in the years of the study. The year 2019 was colder than usual, in 2020 there was a marked water stress with high summer temperatures. This resulted in medium-high acidity in grapes in 2019 and low acidity in 2020, with sugar levels being similar both years. A very marked decrease in must amino nitrogen was observed in 2020, with ammonia nitrogen remaining stable. The correlation of acidity and sugar values in grapes with soil and petiole analysis data made it possible to establish reference levels for the nutritional diagnosis of the Albariño variety in this region. Based on these results, an easy-to-use TIC application is currently being created for grapegrowers, aimed at improving the sustainability of the vineyard through reasoned fertilization. This study has now been extended to other Galician vine varieties.

Assessment of climate change impacts on water needs and growing cycle on grapevine in three DOs of NE Spain

This study assessed the suitability of grapevine growing in three DOs (Empordà, Pla de Bages and Penedès) of Catalonia (NE Spain) over the 21st century. For this purpose, an estimation of water needs and agroclimatic and phenological indicators was made. Climate change impacts were estimated at 1 km pixel resolution using temperature and precipitation projections from several general circulation models (GCM) and two climate change scenarios: RCP 4.5 (stabilization scenario) and RCP 8.5 (worst-case scenario). Potential crop evapotranspiration (following FAO procedure) and a daily water balance considering soil water holding capacity were used to estimate actual evapotranspiration of vines and, finally, water needs. Dynamics would be similar in the three DOs studied although the magnitude of impact differs. Water needs would be 2 and 3 times greater (ranging from 0 to more than 1500 m3/ha) than current water needs at both climate change scenarios. Moreover, blooming date would advance from 3 to 6 weeks, harvest date from 1 to 2.5 months, resulting in growing cycles from 10 to 80 days shorter. It should also be noted that frost risk would decrease from 6 to 76%, the number of days with temperatures above 30ºC during ripening would rise from 48 to 500% and tropical nights (minimum temperature >20ºC) at ripening would increase from 28 to 150%, depending on the scenario and the DOs. The impacts of climate change in the three DOs could result in significant limitations for grapevine cultivation and wine production if adaptive strategies are not applied. This result could serve as a basis for the design of specific and particular adaptation strategies to improve and maintain vineyards in the DOs studied and could be extrapolated to similar DOs and regions.

Comparison of imputation methods in long and varied phenological series. Application to the Conegliano dataset, including observations from 1964 over 400 grape varieties

A large varietal collection including over 1700 varieties was maintained in Conegliano, ITA, since the 1950s. Phenological data on a subset of 400 grape varieties including wine grapes, table grapes, and raisins were acquired at bud break, flowering, veraison, and ripening since 1964. Despite the efforts in maintaining and acquiring data over such an extensive collection, the data set has varying degrees of missing cases depending on the variety and the year. This is ubiquitous in phenology datasets with significant size and length. In this work, we evaluated four state-of-the-art methods to estimate missing values in this phenological series: k-Nearest Neighbour (kNN), Multivariate Imputation by Chained Equations (mice), MissForest, and Bidirectional Recurrent Imputation for Time Series (BRITS). For each phenological stage, we evaluated the performance of the methods in two ways. 1) On the full dataset, we randomly hold-out 10% of the true values for use as a test set and repeated the process 1000 times (Monte Carlo cross-validation). 2) On a reduced and almost complete subset of varieties, we varied the percentage of missing values from 10% to 70% by random deletion. In all cases, we evaluated the performance on the original values using normalized root mean squared error. For the full dataset we also obtained performance statistics by variety and by year. MissForest provided average errors of 17% (3 days) at budbreak, 14% (4 days) at flowering, 14.5% (7 days) at veraison, and 17% (3 days) at maturity. We completed the imputations of the Conegliano dataset, one of the world’s most extensive and varied phenological time series and a steppingstone for future climate change studies in grapes. The dataset is now ready for further analysis, and a rigorous evaluation of imputation errors is included.

Modeling island and coastal vineyards potential in the context of climate change

Climate change impacts regional and local climates, which in turn affects the world’s wine regions. In the short term, these modifications rises issues about maintaining quality and style of wine, and in a longer term about the suitability of grape varieties and the sustainability of traditional wine regions. Thus, adaptation to climate change represents a major challenge for viticulture. In this context, island and coastal vineyards could become coveted areas due to their specific climatic conditions. In regions subject to warming, the proximity of the sea can moderate extremes temperatures, which could be an advantage for wine. However, coastal and island areas are particular prized spaces and subject to multiple pressures that make the establishment or extension of viticulture complex.
In this perspective, it seems relevant to assess the potentialities of coastal and island areas for viticulture. This contribution will present a spatial optimization model that tends to characterize most suitable agroclimatic patterns in historical or emerging vineyards according to different scenarios. Thanks to an in-depth bibliography a global inventory of coastal and insular vineyards on a worldwide scale has been realized. Relevant criteria have been identified to describe the specificities of these vineyards. They are used as input data in the optimization process, which will optimize some objectives and spatial aspects. According to a predefined scenario, the objectives are set in three main categories associated with climatic characteristics, vineyards characteristics and management strategies. At the end of this optimization process, a series of maps presents the different spatial configurations that maximize the scenario objectives.

Current climate change in the Oplenac wine-growing district (Serbia)

Serbian autochthonous vine varieties Smederevka (for white wines) and Prokupac (for rosé and red wines) are the primary representatives of typical characteristics of wines and terroir of numerous wine-growing areas in Serbia. In the past, these varieties were the leading vine varieties, however, as the result of globalization of winemaking and the trend of consumption of wines from widely prevalent vine varieties, they were replaced by introduced international varieties. Smederevka and Prokupac vine varieties are characterized by later time of grape ripening, and relative sensitivity to low temperatures. Climate conditions can be a restrictive factor for production of high-quality grapes and wine and for the spatial spreading of these varieties in hilly continental wine-growing areas.
This paper focuses on the spatial analysis of changes of main climate parameters, in particular, analysis of viticultural bioclimatic indices that were determined for the purposes of viticulture zoning of wine-growing areas in the period 1961-2010, and those same parameters determined for the current, that is, referential climate period (1988-2017). Results of the research, that is, analysis of climate changes indicate that the majority of examined climate parameters in the Oplenac wine-growing district improved from the perspective of Smederevka and Prokupac vine varieties. These studies of climate conditions indicate that changes of analyzed climate parameters, that is, bioclimatic indices will be favorable for cultivation of varieties with later grape ripening times and those more sensitive to low temperatures, such as the autochthonous vine varieties Smederevka and Prokupac, therefore, it is recommended to producers to more actively plant vineyards with these varieties in the territory of the Oplenac wine-growing district.