Predicting provenance and grapevine cultivar implementing machine learning on vineyard soil microbiome data: implications in grapevine breeding
Abstract
The plant rhizosphere microbial communities are an essential component of plant microbiota, which is crucial for sustaining the production of healthy crops. The main drivers of the composition of such communities are the growing environment and the planted genotype. Recent viticulture studies focus on understanding the effects of these factors on soil microbial composition since microbial biodiversity is an important determinant of plant phenotype, and of wine’s organoleptic properties. Microbial biodiversity of different wine regions, for instance, is an important determinant of wine terroir. While conventional methods for microbiome analysis are extensively used, application of modern Artificial Intelligence (AI) based methods could unravel non-linear associations between microbial taxa and environmental/plant genetic factors. Here we compare the performance of shallow and Deep Machine Learning methods to predict the geographical provenance and the planted grape cultivar solely based on the soil microbiota. We used 885 previously published microbial amplicon-sequencing datasets (16S) collected from vineyards located in 13 countries across 4 continents and planted with 34 Vitis vinifera cultivars representing the largest collection of vineyard microbiomes analyzed to date. This research also aimed at addressing some common challenges associated with most ML-based studies such as easy availability of models to non-technical researchers which is necessary for research reproducibility. To facilitate this, the models built in this study will be available through a GUI-based containerized web platform. Also, to provide compatibility of processed data from other 16S studies, a computational step will be included that merge the features either by taxonomy or sequence identity. This study will be beneficial in several ways such as inferring lost/mislabeled samples, identifying important location-specific and cultivar-specific taxa. Ultimately, this approach could be implemented for the identification of the genes regulating host/microbe interactions, which will provide valuable targets for breeding programs aimed at producing more sustainable crops.
Acknowledgements: This study was supported by the National Institute of Food and Agriculture, AFRI Competitive Grant Program Accession number 1018617, and the National Institute of Food and Agriculture, United States Department of Agriculture, Hatch Program accession number 1020852.
DOI:
Issue: ICGWS 2023
Type: Article
Authors
1Environmental Epigenomics and Genomics Group, Department of Horticulture, College of Agriculture, Food and environment, University of Kentucky, Lexington, Kentucky, USA