Macrowine 2021
IVES 9 IVES Conference Series 9 Beyond classical statistics – data fusion coupled with pattern recognition

Beyond classical statistics – data fusion coupled with pattern recognition

Abstract

AIM: Patterns in data obtained from wine chemical and sensory evaluations are difficult to infer using classical statistics. Pattern recognition can be resolved by coupling data fusion with machine learning techniques, possibly leading to new hypotheses being formed. This study demonstrates the applicability of two pattern recognition approaches using as case study involving Chenin Blanc wines (recently bottled and after two years storage) from young (35 years) vines.

METHODS: Sensory (sorting (Mafata et al. 2020)) and chemical (NMR: nuclear magnetic resonance, HRMS: high resolution mass spectrometry, and UV-Vis: ultraviolet spectrophotometry) data were collected for the young and aged (two years in the bottle) wines. Data sets were combined using multiple factor analysis (MFA). Exploratory unsupervised cluster analysis was performed by agglomerative hierarchical clustering (AHC) and Fuzzy-k means (Bezdek 1981). Optimal cluster conditions were found for both methods and the cophenetic coefficient was used to assess the most confident clustering method.

RESULTS: Since large data sets were fused, the models were very complex. There were no consistent clustering patterns when varying clustering conditions, signalling high similarity between samples. The samples could not confidently be distinguished from one another even at the highest optimized conditions. Although Fuzzy-k means gave more confident clustering, it was still not sufficient for solving classification issues in this sample set.

CONCLUSIONS:

Fuzzy-k means was better at resolving the natural grouping of samples. Coupled to data fusion, it could potentially lead to better pattern recognition, especially for oenological chemical and sensory data. The fuzzy approach should be explored, keeping in mind it is more sensitive to small differences in the data compared to classical statistics.

DOI:

Publication date: September 7, 2021

Issue: Macrowine 2021

Type: Article

Authors

Mpho Mafata, Jeanne

1South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University & 2School for Data Science and Computational Thinking, Stellenbosch University, South Africa, BRAND, South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University, South Africa  Astrid, BUICA, South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University

Contact the author

Keywords

data fusion, pattern recognition, machine learning, artificial intelligence, multiple factor analysis, fuzzy-k means, cluster analysis

Citation

Related articles…

Short-term relationships between climate and grapevine trunk diseases in southern French vineyards

[lwp_divi_breadcrumbs home_text="IVES" use_before_icon="on" before_icon="||divi||400" module_id="publication-ariane" _builder_version="4.19.4" _module_preset="default" module_text_align="center" module_font_size="16px" text_orientation="center"...

Rapid damage assessment and grapevine recovery after fire

There is increasing scientific consensus that climate changeis the underlying cause of the prolonged dry and hot conditions that have increased the risk of extreme fire weather in many countries around the world. In December 2019, a bushfire event occurred in the Adelaide Hills, South Australia where 25,000 hectares were burnt and in vineyards and surrounding areas various degrees of scorching and infrastructure damage occurred. The ability to coordinate and plan recovery after a fire event relies on robust and timely data. The current practice for measuring the scale and distribution of fire damage is to walk or drive the vineyard and score individual vines based on visual observation. The process is time consuming, subjective, or semi-quantitative at best. After the December 2019 fires, it took many months to access properties and estimate the area of vineyard damaged. This study compares the rapid assessment and mapping of fire damage using high-resolution satellite imagery with more traditional ground based measures. Satellite imagery tracking vineyard recovery in the season following the bushfire is being correlated to field assessments of vineyard productivity such as canopy health and development, fertility and carbohydrate storage. Canopy health in the seasons following the fires correlated to the severity of the initial fire damage. Severely damaged vines had reduced canopy growth, were infertile or had very low fertility as well as lower carbohydrate levels in buds and canes during dormancy, which reduced productivity in the seasons following the bushfire event. In contrast, vines that received minor damage were able to recover within 1-2 years. Tools that rapidly and affordably capture the extent and severity of damage over large vineyard area will allow producers, government and industry bodies to manage decisions in relation to fire recovery planning, coordination and delivery, improving the efficiency and effectiveness of their response.

Impact of climate variability and change on grape yield in Italy

Viticulture is entangled with weather and climate. Therefore, areas currently suitable for grape production can be challenged by climate change. Winegrowers in Italy already experiences the effect of climate change, especially in the form of warmer growing season, more frequent drought periods, and increased frequency of weather extremes.
The aim of this study is to investigate the impact of climate variability and change on grape yield in Italy to provide winegrowers the information needed to make their business more sustainable and resilient to climate change. We computed a specific range of bioclimatic indices, selected by the International Organisation of Vine and Wine (OIV), and correlated them to grape yield data. We have worked in collaboration with some wine consortiums in northern and central Italy, which provided grape yield data for our analysis.
Using climate variables from the E-OBS dataset we investigate how the bioclimatic indices changed in the past, and the impact of this change on grape productivity in the study areas. The climate impact on productivity is also investigated by using high-resolution convection-permitting models (CPMs – 2.2 horizontal resolution), with the purpose of estimating productivity in future emission scenarios. The CPMs are likely the best available option for this kind of impact studies since they allow a better representation of small-scale processes and features, explicitly resolve deep convection, and show an improved representation of extremes. In our study, we also compare CPMs with regional climate models (RCMs – 12 km horizontal resolution) to assess the added value of high-resolution models for impact studies. Further development of our study will lead to assessing the future suitability for vine cultivation and could lead to the construction of a statistical model for future projection of grape yield.

Late season canopy management practices to reduce sugar loading and improve color profile of Cabernet-Sauvignon grapes and wines in the high irradiance and hot conditions of California Central Valley

Global warming is accelerating grape ripening, leading to unbalanced wines from fruit with high sugar content but poor aroma and colour development. Reducing the size of the photosynthetic apparatus after veraison has been shown to delay technological ripeness in cool climates, but methods have not been tested in areas with high irradiance and temperature where fruit exposure could have disastrous effects on berry composition. In this Cabernet-Sauvignon trial, we compared the application of an antitranspirant (pinolene), to severe canopy topping and above bunch zone leaf removal, all performed at mid-ripening, with an untouched control. We monitored the vines weekly by measuring stem water potential, gas exchange, fruit zone light exposure. We sampled berries to measure berry weight, total soluble solids, pH, titratable acidity, and the anthocyanin profile. At harvest, we assessed yield components, measured carbon isotope discrimination, rated sunburn on clusters, and produced experimental wines. We submitted harvest samples to metabolomic profiling through PFP-Q Exactive MS/MS and wines to sensory analysis. Application of the antitranspirant significantly reduced stomatal conductance and assimilation rate but did not affect the stem water potential. Inversely, leaf removal and topping increased water potential but did not affect leaf gas exchange. The late topping was the only treatment able to decrease sugar content (up to 2Bx), increase titratable acidity and pH, and improve anthocyanin content because of lower degradation of di-hydroxylated forms. Late leaf removal above the bunch zone increased lightning conditions in the canopy and produced the most significant damage on fruits. Yield components were not affected. This work suggests that late-season canopy management can effectively control ripening speeds and improve grapes and wines. Still, the effect on grape exposure in a critical time must be well balanced to avoid problems with the appropriate technique.

Grapevine yield-gap: identification of environmental limitations by soil and climate zoning in Languedoc-Roussillon region (south of France)

Grapevine yield has been historically overlooked, assuming a strong trade-off between grape yield and wine quality. At present, menaced by climate change, many vineyards in Southern France are far from the quality label threshold, becoming grapevine yield-gaps a major subject of concern. Although yield-gaps are well studied in arable crops, we know very little about grapevine yield-gaps. In the present study, we analysed the environmental component of grapevine yield-gaps linked to climate and soil resources in the Languedoc Roussillon. We used SAFRAN data and IGP Pays d’Oc wine yields from 2010 to 2018. We selected climate and soil indicators proving to have a significant effect on average wine yield-gaps at the municipality scale. The most significant factors of grapevine yield were the Soil Available Water Capacity; followed by the Huglin Index and the Climatic Dryness Index. The Days of Frost; the Soil pH; and the Very Hot Days were also significant. Then, we clustered geographical zones presenting similar indicators, facilitating the identification of resources yield-gaps. We discussed the number of zones with the experts of IGP Pays d’Oc label, obtaining 7 zones with similar limitations for grapevine yield. Finally, we analysed the main resources causing yield-gaps and the grapevine varieties planted on each zone. Mapping grapevine resource yield-gaps are the first stage for understanding grapevine yield-gaps at the regional scale.

Macrowine 2021
IVES 9 IVES Conference Series 9 Beyond classical statistics – data fusion coupled with pattern recognition

Beyond classical statistics – data fusion coupled with pattern recognition

Abstract

Content of the article

References

Section for all references

DOI:

Publication date: September 7, 2021

Issue: (ex: Issue: Terclim 2023)

Type: typeofpublication

Authors

author1, author2, author3

Presenting author

Description

List of affiliations ¹ ² ³

Contact the author

Email address (with mailto: link)

Keywords

List of different keywords (keyword1, keyword2, keyword3)

Tags

Citation

Related articles…

Measurement of redox potential as a new analytical winegrowing tool

Excell laboratory has initiated the development of an analytical method based on electrochemistry to evaluate the ability of wines to undergo or resist to oxidative phenomena. Electrochemistry is a powerful tool to probe reactions involving electron transfers and offers possibility of real-time measurements. In that context, the laboratory has implemented electrochemical analysis to assess oxidation state of different wine matrices but also in order to evaluate oxidative or reduced character of leaf and soil. Initially, our laboratory focused on dosage of compounds involved in responses of plant stresses and we were also interested in microbiological activity of soils. These analyses were compared with the measurement of redox potential (Eh) and pH which are two fundamental variables involved in the modulation of plant metabolism. Indeed, the variation of redox states of the plant reflects its biological activity but also its capacity to absorb nutriments. The Eh-pH conditions mainly determine metabolic processes involved in soil and leaf and our goal is to determine if this combined analytical approach will be sufficiently precise to detect biological evolutions (plant health, parasitic attack…).

Adaptability of grapevines to climate change: characterization of phenology and sugar accumulation of 50 varieties, under hot climate conditions

Climate is the major factor influencing the dynamics of the vegetative cycle and can determine the timing of phenological periods. Knowledge of the phenology of varieties, their chronological duration, and thermal requirements, allows not only for the better management of interventions in the vineyard, but also to predict the varieties’ behaviour in a scenario of climate change, giving the wine producer the possibility of selecting the grape varieties that are best adapted to the climatic conditions of a certain terroir. In 2014, Symington Family Estates, Vinhos, established two grape variety libraries in two different places with distinctive climate conditions (Douro Superior, and Cima Corgo), with the commitment of contributing to a deeper agronomic and oenological understanding of some grape varieties, in hot climate conditions. In these research vineyards are represented local varieties that are important in the regional and national viticulture, but also others that have over time been forgotten — as well as five international reference cultivars. From 2017 to 2021, phenological observations have been made three times a week, following a defined protocol, to determine the average dates of budbreak, flowering and veraison. With the climate data of each location, the thermal requirements of each variety and the chronological duration of each phase have been calculated. During maturation, berry samples have been gathered weekly to study the dynamics of sugar accumulation, between other parameters. The data was analysed applying phenological and sugar accumulation models available in literature. The results obtained show significant differences between the varieties over several parameters, from the chronological duration and thermal requirements to complete the various stages of development, to the differences between the two locations, confirming the influence of the climate on phenology and the stages of maturation, in these specific conditions.

Teasing apart terroir: the influence of management style on native yeast communities within Oregon wineries and vineyards

Newer sequencing technologies have allowed for the addition of microbes to the story of terroir. The same environmental factors that influence the phenotypic expression of a crop also shape the composition of the microbial communities found on that crop. For fermented goods, such as wine, that microbial community ultimately influences the organoleptic properties of the final product that is delivered to customers. Recent studies have begun to study the biogeography of wine-associated microbes within different growing regions, finding that communities are distinct across landscapes. Despite this new knowledge, there are still many questions about what factors drive these differences. Our goal was to quantify differences in yeast communities due to management style between seven pairs of conventional and biodynamic vineyards (14 in total) throughout Oregon, USA. We wanted to answer the following questions: 1) are yeast communities distinct between biodynamic vineyards and conventional vineyards? 2) are these differences consistent across a large geographic region? 3) can differences in yeast communities be tied to differences in metabolite profiles of the bottled wine? To collect our data we took soil, bark, leaf, and grape samples from within each vineyard from five different vines of pinot noir. We also collected must and a 10º brix sample from each winery. Using these samples, we performed 18S amplicon sequencing to identify the yeast present. We then used metabolomics to characterize the organoleptic compounds present in the bottled wine from the blocks the year that we sampled. We are actively in the process of analysing our data from this study.

Comparison of imputation methods in long and varied phenological series. Application to the Conegliano dataset, including observations from 1964 over 400 grape varieties

A large varietal collection including over 1700 varieties was maintained in Conegliano, ITA, since the 1950s. Phenological data on a subset of 400 grape varieties including wine grapes, table grapes, and raisins were acquired at bud break, flowering, veraison, and ripening since 1964. Despite the efforts in maintaining and acquiring data over such an extensive collection, the data set has varying degrees of missing cases depending on the variety and the year. This is ubiquitous in phenology datasets with significant size and length. In this work, we evaluated four state-of-the-art methods to estimate missing values in this phenological series: k-Nearest Neighbour (kNN), Multivariate Imputation by Chained Equations (mice), MissForest, and Bidirectional Recurrent Imputation for Time Series (BRITS). For each phenological stage, we evaluated the performance of the methods in two ways. 1) On the full dataset, we randomly hold-out 10% of the true values for use as a test set and repeated the process 1000 times (Monte Carlo cross-validation). 2) On a reduced and almost complete subset of varieties, we varied the percentage of missing values from 10% to 70% by random deletion. In all cases, we evaluated the performance on the original values using normalized root mean squared error. For the full dataset we also obtained performance statistics by variety and by year. MissForest provided average errors of 17% (3 days) at budbreak, 14% (4 days) at flowering, 14.5% (7 days) at veraison, and 17% (3 days) at maturity. We completed the imputations of the Conegliano dataset, one of the world’s most extensive and varied phenological time series and a steppingstone for future climate change studies in grapes. The dataset is now ready for further analysis, and a rigorous evaluation of imputation errors is included.

Underpinning terroir with data: rethinking the zoning paradigm

Agriculture, natural resource management and the production and sale of products such as wine are increasingly data-driven activities. Thus, the use of remote and proximal crop and soil sensors to aid management decisions is becoming commonplace and ‘Agtech’ is proliferating commercially; mapping, underpinned by geographical information systems and complex methods of spatial analysis, is widely used. Likewise, the chemical and sensory analysis of wines draws on multivariate statistics; the efficient winery intake of grapes, subsequent production of wines and their delivery to markets relies on logistics; whilst the sales and marketing of wines is increasingly driven by artificial intelligence linked to the recorded purchasing behaviour of consumers. In brief, there is data everywhere!

Opinions will vary on whether these developments are a good thing. Those concerned with the ‘mystique’ of wine, or the historical aspects of terroir and its preservation, may find them confronting. In contrast, they offer an opportunity to those interested in the biophysical elements of terroir, and efforts aimed at better understanding how these impact on vineyard performance and the sensory attributes of resultant wines. At the previous Terroir Congress, we demonstrated the potential of analytical methods used at the within-vineyard scale in the development of Precision Viticulture, in contributing to a quantitative understanding of regional terroir. For this conference, we take this approach forward with examples from contrasting locations in both the northern and southern hemispheres. We show how, by focussing on the vineyards within winegrowing regions, as opposed to all of the land within those regions, we might move towards a more robust terroir zoning than one derived from a mixture of history, thematic mapping, heuristics and the whims of marketers. Aside from providing improved understanding by underpinning terroir with data, such methods should also promote improved management of the entire wine value chain.