Macrowine 2021
IVES 9 IVES Conference Series 9 Beyond classical statistics – data fusion coupled with pattern recognition

Beyond classical statistics – data fusion coupled with pattern recognition

Abstract

AIM: Patterns in data obtained from wine chemical and sensory evaluations are difficult to infer using classical statistics. Pattern recognition can be resolved by coupling data fusion with machine learning techniques, possibly leading to new hypotheses being formed. This study demonstrates the applicability of two pattern recognition approaches using as case study involving Chenin Blanc wines (recently bottled and after two years storage) from young (35 years) vines.

METHODS: Sensory (sorting (Mafata et al. 2020)) and chemical (NMR: nuclear magnetic resonance, HRMS: high resolution mass spectrometry, and UV-Vis: ultraviolet spectrophotometry) data were collected for the young and aged (two years in the bottle) wines. Data sets were combined using multiple factor analysis (MFA). Exploratory unsupervised cluster analysis was performed by agglomerative hierarchical clustering (AHC) and Fuzzy-k means (Bezdek 1981). Optimal cluster conditions were found for both methods and the cophenetic coefficient was used to assess the most confident clustering method.

RESULTS: Since large data sets were fused, the models were very complex. There were no consistent clustering patterns when varying clustering conditions, signalling high similarity between samples. The samples could not confidently be distinguished from one another even at the highest optimized conditions. Although Fuzzy-k means gave more confident clustering, it was still not sufficient for solving classification issues in this sample set.

CONCLUSIONS:

Fuzzy-k means was better at resolving the natural grouping of samples. Coupled to data fusion, it could potentially lead to better pattern recognition, especially for oenological chemical and sensory data. The fuzzy approach should be explored, keeping in mind it is more sensitive to small differences in the data compared to classical statistics.

DOI:

Publication date: September 7, 2021

Issue: Macrowine 2021

Type: Article

Authors

Mpho Mafata, Jeanne

1South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University & 2School for Data Science and Computational Thinking, Stellenbosch University, South Africa, BRAND, South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University, South Africa  Astrid, BUICA, South African Grape and Wine Research Institute, Department of Viticulture and Oenology, Stellenbosch University

Contact the author

Keywords

data fusion, pattern recognition, machine learning, artificial intelligence, multiple factor analysis, fuzzy-k means, cluster analysis

Citation

Related articles…

Impact of long term agroecological and conventional practices on subsurface soil microbiota in Macabeu and Xarel·lo vineyards

There is a growing trend on the transition from conventional to agroecological management of vineyards. However, the impact of practices, such as reduced-tillage, organic fertilization and cover crops, is not well-understood regarding the soil microbial diversity, and its relationship with the soil physicochemical properties in the subsurface depth near the rooting zone. Soil bacterial diversity is an important contributor towards plant health, productivity and response to environmental stresses. A field experiment was conducted by sampling subsurface soil bacterial community (NGS and qPCR) near to the root zone of Macabeu and Xarel·lo vineyards, located at the Penedes. 3 organic (ECO) and 3 conventional (CON) vineyards, with more than 10 years of respective management were sampled (n=5 each plot). ECO practices did not affect bacterial and fungal abundance but increased significantly the ammonium oxidizing bacteria and alpha-diversity (Inv.Simpson). Interestingly beta-diversity was significantly affected by the management strategy. ANOSIM-tests revealed a significative effect of the management (ecological vs conventional) and plot, on the soil microbial structure (ASV abundance). Main phyla depicted were Proteobacteria, Actinobacteria and Acidobacteria, whose relative abundances were not affected by the management. EdgeR assay revealed a significant increase of Cyanobacteria and decrease of Gemmatimonadetes and Firmicutes phyla in ECO. Interestingly, the grapevine variety was not correlated with the soil microbial community structure. Mantel-test revealed an important correlation (Spearman) of some physicochemical parameters with the soil microbiota structure, in order of importance: texture, EC, pH Ca/Mg, Mg/P, K+, Mg2+, Ca2+, SO42-, and OM. N-NH4 and NTK, which were higher in the ECO managed soils, did not correlated significantly with the soil microbiome population. The results revealed the importance of combining a deep physicochemical characterization of each replicate with the microbial diversity assessment to gain better insights on the relationship between soil microbiome and vineyard management.

Projected changes in vine phenology of two varieties with different thermal requirements cultivated in La Mancha DO (Spain) under climate change scenarios

The aim of this work was to analyze the phenology variability of Tempranillo and Chardonnay cultivars, related to the climatic characteristics in La Mancha Designation of Origin, and their potential changes under climate change scenarios. Phenological dates referred to budbreak, flowering, veraison and harvest were analyzed for the period 2000-2019. The weather conditions at daily time scale, recorded during the same period, were also evaluated. The thermal requirements to reach each of these phenological stages were calculated and expressed as the GDD accumulated from DOY=60. Changes in phenology were projected by 2050 and 2070 taking into account those values and the projected temperatures and precipitation, simulated under two Representative Concentration Pathway (RCP) scenarios –RCP4.5 and RCP8.5– using an ensemble of models. The average phenological dates during the period under study were, April 16th ± 6.6 days and April 5th ± 6.0 days for budbreak, May 31st ± 6.0 days and May 27th ± 5.3 days for flowering, July 26th ± 5.6 days and July 25th ± 5.8 days for veraison, and Ago 23rd ± 10.8 days and Ago 17th ± 9.0 days for harvest, respectively, for Tempranillo and Chardonnay. The projected changes in temperature imply an average change in the maximum growing season (April-August) temperatures of 1.2 and 1.9°C by 2050, and 1.6 and 2.6°C by 2070, under the RCP4.5 and RCP8.5 scenarios, respectively. A reduction in precipitation is predicted, which vary between 15% for 2050 under RCP4.5 scenario and up to 30% by 2070 under RCP8.5. The advance of the phenological dates for 2050, could be of 6, 7, 7, and 8 days for Tempranillo and 4, 6, 6 and 9 days for Chardonnay, respectively for budbreak, flowering, veraison and harvest under the RCP4.5 scenario. Under the RCP8.5 emission scenario, the advance could be up to 30% higher.

Grapevine yield estimation in a context of climate change: the GraY model

Grapevine yield is a key indicator to assess the impacts of climate change and the relevance of adaptation strategies in a vineyard landscape. At this scale, a yield model should use a number of parameters and input data in relation to the information available and be able to reproduce vineyard management decisions (e.g. soil and canopy management, irrigation). In this study, we used data from six experimental sites in Southern France (cv. Syrah) to calibrate a model of grapevine yield limited by water constraint (GraY). Each yield component (bud fertility, number of berries per bunch, berry weight) was calculated as a function of the soil water availability simulated by the WaLIS water balance model at critical phenological phases. The model was then evaluated in 10 grapegrowers’ plots, covering a diversity of biophysical and technical contexts (soil type, canopy size, irrigation, cover crop). We identified three critical periods for yield formation: after flowering on the previous year for the number of bunches and berries, around pre-veraison and post-veraison of the same year for mean berry weight. Yields were simulated with a model efficiency (EF) of 0.62 (NRMSE = 0.28). Bud fertility and number of berries per bunch were more accurately simulated (EF = 0.90 and 0.77, NRMSE = 0.06 and 0.10, respectively) than berry weight (EF = -0.31, NRMSE = 0.17). Model efficiency on the on-farm plots reached 0.71 (NRMSE = 0.37) simulating yields from 1 to 8 kg/plant. The GraY model is an original model estimating grapevine yield evolution on the basis of water availability under future climatic conditions.  It allows to evaluate the effects of various adaptation levers such as planting density, cover crop management, fruit/leaf ratio, shading and irrigation, in various production contexts.

Local adaptation tools to ensure the viticultural sustainability in a changing climate

[lwp_divi_breadcrumbs home_text="IVES" use_before_icon="on" before_icon="||divi||400" module_id="publication-ariane" _builder_version="4.19.4" _module_preset="default" module_text_align="center" module_font_size="16px" text_orientation="center"...

Effect of partial net shading on the temperature and radiation in the grapevine canopy, consequences on the grape quality of cv. Gros Manseng in PDO Pacherenc-du-vic-Bilh

As elsewhere, southwestern France vineyards face more recurrent summer heat waves these last years. Among the possibilities of adaptation to this climate changing parameter, the use of net shading is a technique that allow for limiting canopy exposure to radiations. In this trial, we tested net shading installed on one face of the canopy, on a north-south row-oriented plot of cv. Gros Manseng trained on VSP system in the PDO Pacherenc-du-Vic-Bilh. The purpose was to characterize the effects on the ambient canopy temperatures and radiations during the season and to observe the consequences on the composition of grapes and wines. Two sorts of net were used with two levels of obstruction (50% and 75%) of the photosynthesis active radiation (PAR). They have been installed on the west side of the canopy and compared to a netless control. Temperature and PAR sensors registered hourly data during the season. On specific summer day (hot and sunny) manual measurements took also place on bunches (temperature) and in different spots of the canopy (PAR). The results showed that, on clear days, the radiation is lowered by the shade nets respecting the supplier criteria. The effects on the ambient canopy temperature were inconstant on this plot when we observed the data from the global period of shading between fruit set and harvest. However, during hot days (>30°C), the temperature in the canopy was reduced during afternoon and the temperature of the bunch surface was reduced as well comparing to the control. A decrease of the maturity parameters of the berries, sugar and acidity, was also observed. Concerning the wine aromatic potential, no differences clearly appeared.

Macrowine 2021
IVES 9 IVES Conference Series 9 Beyond classical statistics – data fusion coupled with pattern recognition

Beyond classical statistics – data fusion coupled with pattern recognition

Abstract

Content of the article

References

Section for all references

DOI:

Publication date: September 7, 2021

Issue: (ex: Issue: Terclim 2023)

Type: typeofpublication

Authors

author1, author2, author3

Presenting author

Description

List of affiliations ¹ ² ³

Contact the author

Email address (with mailto: link)

Keywords

List of different keywords (keyword1, keyword2, keyword3)

Tags

Citation

Related articles…

An analytical framework to site-specifically study climate influence on grapevine involving the functional and Bayesian exploration of farm data time series synchronized using an eGDD thermal index

Climate influence on grapevine physiology is prevalent and this influence is only expected to increase with climate change. Although governed by a general determinism, climate influence on grapevine physiology may present variations according to the terroir. In addition, these site-specific differences are likely to be enhanced when climate influence is studied using farm data. Indeed, farm data integrate additional sources of variation such as a varying representativity of the conditions actually experienced in the field. Nevertheless, there is a real challenge in valuing farm data to enable grape growers to understand their own terroir and consequently adapt their practices to the local conditions. In such a context, this article proposes a framework to site-specifically study climate influence on grapevine physiology using farm data. It focuses on improving the analysis of time series of weather data. The analytical framework includes the synchronization of time series using site-specific thermal indices computed with an original method called Extended Growing Degree Days (eGDD). Synchronized time series are then analyzed using a Bayesian functional Linear regression with Sparse Steps functions (BLiSS) in order to detect site-specific periods of strong climate influence on yield development. The article focuses on temperature and rain influence on grape yield development as a case study. It uses data from three commercial vineyards respectively situated in the Bordeaux region (France), California (USA) and Israel. For all vineyards, common periods of climate influence on yield development were found. They corresponded to already known periods, for example around veraison of the year before harvest. However, the periods differed in their precise timing (e.g. before, around or after veraison), duration and correlation direction with yield. Other periods were found for only one or two vineyards and/or were not referred to in literature, for example during the winter before harvest.

Comparison of imputation methods in long and varied phenological series. Application to the Conegliano dataset, including observations from 1964 over 400 grape varieties

A large varietal collection including over 1700 varieties was maintained in Conegliano, ITA, since the 1950s. Phenological data on a subset of 400 grape varieties including wine grapes, table grapes, and raisins were acquired at bud break, flowering, veraison, and ripening since 1964. Despite the efforts in maintaining and acquiring data over such an extensive collection, the data set has varying degrees of missing cases depending on the variety and the year. This is ubiquitous in phenology datasets with significant size and length. In this work, we evaluated four state-of-the-art methods to estimate missing values in this phenological series: k-Nearest Neighbour (kNN), Multivariate Imputation by Chained Equations (mice), MissForest, and Bidirectional Recurrent Imputation for Time Series (BRITS). For each phenological stage, we evaluated the performance of the methods in two ways. 1) On the full dataset, we randomly hold-out 10% of the true values for use as a test set and repeated the process 1000 times (Monte Carlo cross-validation). 2) On a reduced and almost complete subset of varieties, we varied the percentage of missing values from 10% to 70% by random deletion. In all cases, we evaluated the performance on the original values using normalized root mean squared error. For the full dataset we also obtained performance statistics by variety and by year. MissForest provided average errors of 17% (3 days) at budbreak, 14% (4 days) at flowering, 14.5% (7 days) at veraison, and 17% (3 days) at maturity. We completed the imputations of the Conegliano dataset, one of the world’s most extensive and varied phenological time series and a steppingstone for future climate change studies in grapes. The dataset is now ready for further analysis, and a rigorous evaluation of imputation errors is included.

Aromatic maturity is a cornerstone of terroir expression in red wine

Harvesting grapes at adequate maturity is key to the production of high-quality red wines. Enologists and wine makers define several types of maturity, including technical maturity, phenolic maturity and aromatic maturity. Technical maturity and phenolic maturity are relatively well documented in the scientific literature, while articles on aromatic maturity are scarcer. This is surprising, because aromatic maturity is, without a doubt, the most important of the three in determining wine quality and typicity (including terroir expression). Optimal terroir expression can be obtained when the different types of maturity are reached at the same time, or within a short time frame. This is more likely to occur when the ripening takes place under mild temperatures, neither too cool, nor too hot. Aromatic expression in wine can be driven, from low to high maturity, by green, herbal, fresh fruit, ripe fruit, jammy fruit, candied fruit or cooked fruit aromas. Green and cooked fruit aromas are not desirable in red wines, while the levels of other aromatic compounds contribute to the typicity of the wine in relation to its origin. Wines produced in cool climates, or on cool soils in temperate climates, are likely to express herbal or fresh fruit aromas; while wines produced under warm climates, or on warm soils in temperate climates, may express ripe fruit, jammy fruit or candied fruit aromas. Growers can optimize terroir expression through their choice of grapevine variety. Early ripening varieties perform better in cool climates and late ripening varieties in warm climates. Additionally, maturity can be advanced or delayed by different canopy management practices or training systems.

Grape berry size is a key factor in determining New Zealand Pinot noir wine composition

Making high quality but affordable Pinot noir (PN) wine is challenging in most terroirs and New Zealand’s (NZ) situation is no exception. To increase the probability of making highly typical PN wines producers choose to grow grapes in cool climates on lower fertility soils while adopting labour intensive practices. Stringent yield targets and higher input costs necessarily mean that PN wine cost is high, and profitability lower, in line-priced varietal wine ranges. To understand the reasons why higher yielding vines are perceived to produce wines of lower quality we have undertaken an extensive study of PN in NZ. Since 2018, we established a network of twelve trial sites in three NZ regions to find individual vines that produced acceptable commercial yields (above 2.5kg per vine) and wines of composition comparable to “Icon” labels. Approximately 20% of 660 grape lots (N = 135) were selected from within a narrow juice Total Soluble Solids (TSS) range and made into single vine wines under controlled conditions. Principal Component Analysis of the vine, berry, juice and wine parameters from three vintages found grape berry mass to be most effective clustering variable. As berry mass category decreased there was a systematic increase in the probability of higher berry red colour and total phenolics with a parallel increase in wine phenolics, changed aroma fraction and decreased juice amino acids. The influence of berry size on wine composition would appear stronger than the individual effects of vintage, region, vineyard or vine yield. Our observations support the hypothesis that it is possible to produce PN wines that fall within an “Icon” benchmark composition range at yields above 2.5kg per vine provided that the Leaf Area:Fruit Weight ratio is above 12cm2 per g, mean berry mass is below 1.2g and juice TSS is above 22°Brix.

Geospatial trends of bioclimatic indexes in the topographically complex region of Barolo DOCG

Barolo DOCG is an economically important wine producing region in Northwest Italy. It is a small region of approximately 70 km2 gross area. The topography is very complex with steep sloped hills ranging in elevation from below 200 m to 550 m. Barolo DOCG wine is made exclusively from the Nebbiolo grape. Bioclimatic indexes are often used in viticulture to gain a better understanding of broader climate trends which can be compared temporally and geographically. These indexes are also used for identifying potential phenological timing, growing region suitability, and potential risks associated with expected climatic changes. Understanding how topography influences bioclimatic indexes can help with understanding of mesoscale climate behaviour leading to improved decision making and risk management strategies. The average monthly maximum and minimum temperatures, the Cool Night Index, the Huglin Index, and the monthly diurnal range (from July to October) were calculated using data from 45 weather stations within a 40 km radius of the Barolo DOCG growing area between the years 1996 and 2019. Linear and multiple regression models were developed using independent variables (elevation, aspect, slope) extracted from a digital elevation model to identify significant relationships. Bioclimatic indexes were then kriged with external drift using independent variables that showed significant relationships with the bioclimatic index using a 100 m resolution grid. The maximum monthly temperatures and the Huglin Index showed consistent significant negative relationships with elevation in all years. The minimum monthly temperatures showed no relationship with elevation but in some months a small but significant relationship was observed with aspect. Due to the lack of a relationship between minimum monthly temperatures and elevation compared to the significant relationship between maximum monthly temperatures and elevation, monthly diurnal range had a negative relationship with elevation.