Effect of minor allele frequency on the predictive accuracy of genomic selection models optimized by grid search
Abstract
Although the grapevine (Vitis spp.) is among the oldest and most economically significant fruit species globally (Candar et al., 2021), its genetic improvement faces major bottlenecks due to long juvenile periods and extended cycles for phenotype evaluation (Alleweldt & Possingham, 1988; Camargo et al., 2011; Bharati et al., 2023). In this context, Genomic Selection (GS) has emerged as an effective alternative to traditional phenotypic selection, offering a robust framework to optimize breeding programs by significantlyreducing generation intervals while enhancing predictive accuracy (PA) and expected genetic gains (EGG) (Souza et al., 2019;Francisco et al., 2021; Aono et al., 2022; Bharati et al., 2023). This study evaluated the effect of data dimensionality reduction on GS performance by selecting single-nucleotide polymorphisms (SNPs) based on Minor Allele Frequency (MAF) thresholds. Theexperimental design tested the predictive capacities of four machine learning (ML) algorithms (ElasticNet, K-Neighbors, Support Vector Machine Regression, and XGBoost) alongside the conventional Genomic Best Linear Unbiased Prediction (gBLUP) model. These were validated using three SNP datasets (11,115, 9,494, and 6,100 markers) filtered by MAF levels of 0.05, 0.1, and 0.2 acrosssix genetic traits, while also comparing EGG between conventional breeding and GS via the breeder’s equation. Results revealed that ML models exhibited remarkable stability, with no significant differences in PAacross different MAF-based SNP densities, except for Berry Length, which showed a substantial difference with XGBoost at MAF 0.2. Conversely, gBLUP demonstrated high sensitivity todimensionality reduction, with MAF filtering significantly affecting its performance across all traits. This suggests that ML-based approaches offer greater flexibility in feature reduction than traditional GS models that rely on a genomic kinship matrix. Morphological traits generally achieved higher predictive capacities than chemical ones, with peak accuracies identified as: 0.79 for Berry Length (Support Vector Machine Regression, MAF 0.05), 0.61 for Cluster Length (gBLUP, MAF 0.1), 0.60 for Cluster Width (gBLUP, MAF 0.05), 0.55 for Rachis Fresh Weight (gBLUP, MAF 0.1), 0.58 for Maturation Index (gBLUP, MAF 0.1), and 0.35 for Total Soluble Solids (gBLUP, MAF 0.05). Furthermore, the trained models were applied to an external population, yielding results that were highly consistent with those obtained during training, underscoring their robustness. Ultimately, all GS models yielded genetic gainssuperior to those of traditional breeding, with improvements ranging from an 8.90-fold increase in Berry Length to a 2.86-fold increase in Total Soluble Solids, confirming that GS integration is exceptionally promising for enhancing efficiency in global viticulture.
Acknowledgements
FAPESP 2020/12938-7 for funding research. FAPESP 2023/09468-7 for postdoctoral fellowship CNPq – PQ – 305546/2022-8 forresearch productivity grant.
References
Alleweldt, G., & Possingham, J. V. (1988). Progress in grapevine breeding. Theoretical and Applied Genetics, 75, 669– 673.https://doi.org/10.1007/BF00265961
Aono, A. H., Francisco, F. R., Da Silva, C. C., Gonçalves, P. S., Scaloppi Júnior, E. J., Le Guen, V., Fritsche-Neto, R., Souza, L. M., & Souza, A. P. (2022). Adivide-and-conquer approach for genomic prediction in rubber tree using machine learning.
Scientific Reports, 12, 18023. https://doi.org/10.1038/s41598-022-22444-2
Bharati, R., Sen, M. K., Severová, L., Svoboda, R., & Fernández-Cusimamani, E. (2023). Polyploidization and genomic selection integration for grapevine breeding: Aperspective. Frontiers in Plant Science, 14, 1248978. https://doi.org/10.3389/fpls.2023.1248978
Francisco, F. R., Aono, A. H., Da Silva, C. C., Gonçalves, P. S., Scaloppi Júnior, E. J., Le Guen, V., Fritsche-Neto, R., Souza, L. M., & Souza, A. P. (2021). Unravelling rubber tree growth by integrating GWAS and biological network-based approaches.
Frontiers in Plant Science, 12, 768589. https://doi.org/10.3389/fpls.2021.768589
Souza, L. M., Francisco, F. R., Gonçalves, P. S., Scaloppi Júnior, E. J., Le Guen, V., Fritsche-Neto, R., & Souza, A. P. (2019). Genomicselection in rubber tree breeding: a comparison of models and methods for managing G× E interactions.
Frontiers in Plant Science, 10, 1353. https://doi.org/10.3389/fpls.2019.01353
Issue: GBG 2026
Type: Poster
Authors
1 Advanced Fruit Research Division, Instituto Agronômico (IAC), Jundiaí, SP, Brazil
2 Molecular Biology and Genetic Engineering Center (CBMEG), Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil
3 Center for Plant Molecular Breeding (CeM²P), Universidade Estadual de Campinas (UNICAMP), Campinas, SP, Brazil
4 Horticulture Sciences, College of Agriculture and Life Sciences, North Carolina State University, Raleigh
5 Department of Plant Biology, Biology Institute, Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil
Contact the author*
Keywords
Vitis vinifera L., feature selection, machine learning, genomic prediction, plant breeding