Catalogue Search | MBRL

Computer vision and machine learning enabled soybean root phenotyping pipeline

by Singh, Asheesh K. , Parmley, Kyle A. , Jubery, Talukder Z. in Agricultural economics , Agricultural production , Analysis

2020

Background Root system architecture (RSA) traits are of interest for breeding selection; however, measurement of these traits is difficult, resource intensive, and results in large variability. The advent of computer vision and machine learning (ML) enabled trait extraction and measurement has renewed interest in utilizing RSA traits for genetic enhancement to develop more robust and resilient crop cultivars. We developed a mobile, low-cost, and high-resolution root phenotyping system composed of an imaging platform with computer vision and ML based segmentation approach to establish a seamless end-to-end pipeline - from obtaining large quantities of root samples through image based trait processing and analysis. Results This high throughput phenotyping system, which has the capacity to handle hundreds to thousands of plants, integrates time series image capture coupled with automated image processing that uses optical character recognition (OCR) to identify seedlings via barcode, followed by robust segmentation integrating convolutional auto-encoder (CAE) method prior to feature extraction. The pipeline includes an updated and customized version of the Automatic Root Imaging Analysis (ARIA) root phenotyping software. Using this system, we studied diverse soybean accessions from a wide geographical distribution and report genetic variability for RSA traits, including root shape, length, number, mass, and angle. Conclusions This system provides a high-throughput, cost effective, non-destructive methodology that delivers biologically relevant time-series data on root growth and development for phenomics, genomics, and plant breeding applications. This phenotyping platform is designed to quantify root traits and rank genotypes in a common environment thereby serving as a selection tool for use in plant breeding. Root phenotyping platforms and image based phenotyping are essential to mirror the current focus on shoot phenotyping in breeding efforts.

Journal Article

Share this book

Add to My Shelf

AutoSiQ: a curated haploid Arabidopsis thaliana inflorescence dataset with a fine-grained silique ontology and a deep learning application for haploid fertility quantification

by Aboobucker, Siddique I. , Jubery, Talukder Z. , Ganapathysubramanian, Baskar in Annotations , Arabidopsis thaliana , Automation

2026

Doubled haploid (DH) technology can fast-track crop breeding. Haploid induction yields haploids with only one set of genomes, which are usually sterile. Haploid fertility (HF) is the ability of haploid plants to set seed, and it is a critical bottleneck in DH pipelines. Genetic mechanisms to restore HF hold immense potential in DH crop breeding, yet its phenotyping remains manual, destructive, and inconsistent. While recent advances in imaging and machine learning have improved throughput for general plant traits, no curated image dataset exists for Arabidopsis thaliana that explicitly represents HF. Here, we present AutoSiQ, a dataset and baseline deep learning pipeline for automated HF quantification. AutoSiQ includes high-resolution scanned inflorescences annotated with a seven-class ontology encompassing green siliques, green fertile siliques, mature siliques, fertile siliques, cracked fertile siliques, cracked siliques, and flowers. This multi-class annotation scheme preserves biologically meaningful information beyond binary fertile/non-fertile distinctions, enabling reliable fertility estimation and future phenotyping applications. We release baseline object detection models (YOLOv5), trained using the AutoSiQ dataset, and evaluate their performance across confidence thresholds. Model predictions strongly correlate with manual counts, achieving R² up to 0.94 for total silique number estimation. We further demonstrate AutoSiQ’s utility for automated haploid fertility rate (HFR) estimation and genotype discrimination between two contrasting genotypes (WT and bmf2 mutant). A longitudinal analysis identifies ~60 days after sowing (DAS) as the optimal harvest time for maximizing mature silique counts by balancing between the number of immature buds and silique shattering. By releasing both the dataset and baseline code, AutoSiQ provides a reproducible and extensible foundation for high-throughput fertility phenotyping in haploid Arabidopsis .

Journal Article

Share this book

Add to My Shelf

Self-supervised maize kernel classification and segmentation for embryo identification

by Jubery, Talukder Z. , Ganapathysubramanian, Baskar , Frei, Ursula K. in Annotations , Classification , Computer vision

2023

Computer vision and deep learning (DL) techniques have succeeded in a wide range of diverse fields. Recently, these techniques have been successfully deployed in plant science applications to address food security, productivity, and environmental sustainability problems for a growing global population. However, training these DL models often necessitates the large-scale manual annotation of data which frequently becomes a tedious and time-and-resource- intensive process. Recent advances in self-supervised learning (SSL) methods have proven instrumental in overcoming these obstacles, using purely unlabeled datasets to pre-train DL models. Here, we implement the popular self-supervised contrastive learning methods of NNCLR Nearest neighbor Contrastive Learning of visual Representations) and SimCLR (Simple framework for Contrastive Learning of visual Representations) for the classification of spatial orientation and segmentation of embryos of maize kernels. Maize kernels are imaged using a commercial high-throughput imaging system. This image data is often used in multiple downstream applications across both production and breeding applications, for instance, sorting for oil content based on segmenting and quantifying the scutellum's size and for classifying haploid and diploid kernels. We show that in both classification and segmentation problems, SSL techniques outperform their purely supervised transfer learning-based counterparts and are significantly more annotation efficient. Additionally, we show that a single SSL pre-trained model can be efficiently finetuned for both classification and segmentation, indicating good transferability across multiple downstream applications. Segmentation models with SSL-pretrained backbones produce DICE similarity coefficients of 0.81, higher than the 0.78 and 0.73 of those with ImageNet-pretrained and randomly initialized backbones, respectively. We observe that finetuning classification and segmentation models on as little as 1% annotation produces competitive results. These results show SSL provides a meaningful step forward in data efficiency with agricultural deep learning and computer vision.

Journal Article

Share this book

Add to My Shelf

“Canopy fingerprints” for characterizing three-dimensional point cloud data of soybean canopies

by Singh, Asheesh K. , Young, Therin J. , Jubery, Talukder Z. in 3D point cloud , Canopies , Chemical fingerprinting

2023

Advances in imaging hardware allow high throughput capture of the detailed three-dimensional (3D) structure of plant canopies. The point cloud data is typically post-processed to extract coarse-scale geometric features (like volume, surface area, height, etc.) for downstream analysis. We extend feature extraction from 3D point cloud data to various additional features, which we denote as ‘canopy fingerprints’. This is motivated by the successful application of the fingerprint concept for molecular fingerprints in chemistry applications and acoustic fingerprints in sound engineering applications. We developed an end-to-end pipeline to generate canopy fingerprints of a three-dimensional point cloud of soybean [ Glycine max (L.) Merr.] canopies grown in hill plots captured by a terrestrial laser scanner (TLS). The pipeline includes noise removal, registration, and plot extraction, followed by the canopy fingerprint generation. The canopy fingerprints are generated by splitting the data into multiple sub-canopy scale components and extracting sub-canopy scale geometric features. The generated canopy fingerprints are interpretable and can assist in identifying patterns in a database of canopies, querying similar canopies, or identifying canopies with a certain shape. The framework can be extended to other modalities (for instance, hyperspectral point clouds) and tuned to find the most informative fingerprint representation for downstream tasks. These canopy fingerprints can aid in the utilization of canopy traits at previously unutilized scales, and therefore have applications in plant breeding and resilient crop production.

Journal Article

Share this book

Add to My Shelf

Enhancing yield prediction from plot-level satellite imagery through genotype and environment feature disentanglement

by Powadi, Anirudha A. , Schnable, James C. , Jubery, Talukder Z. in Accuracy , Agricultural production , Clustering

2025

Accurately predicting yield during the growing season enables improved crop management and better resource allocation for both breeders and growers. Existing yield prediction models for an entire field or individual plots are based on satellite-derived vegetation indices (VIs) and widely used machine learning-based feature extraction models, including principal component analysis (PCA) and autoencoders (AE). Here, we significantly enhance pre-harvest yield prediction at plot-scale using Compositional Autoencoders (CAE) — a deep-learning-based feature extraction approach designed to disentangle genotype (G) and environment (E) features — on high-resolution, plot-level satellite imagery. Our approach uses a dataset of approximately 4,000 satellite images collected from replicated plots of 84 hybrid maize varieties grown at five distinct locations across the U.S. Corn Belt. By deploying the CAE model, we improve the separation of genotype and environment effects, enabling more accurate incorporation of genotype-by-environment (GxE) interactions for downstream prediction tasks. Results show that the CAE-based features improve early-stage yield predictions by up to 10% compared to traditional autoencoder-based features and outperform vegetation indices (VIs) by 9% across various growth stages. The CAE model also excels in separating environmental factors, achieving a high silhouette score of 0.919, indicating effective clustering of environmental features. Moreover, the CAE consistently outperforms standard models in unseen environments and unseen genotypes yield predictions, demonstrating strong generalizability. This study demonstrates the value of disentangling G and E effects for providing more accurate and early yield predictions that support informed decision-making in precision agriculture and plant breeding.

Journal Article

Share this book

Add to My Shelf

Soybean maturity prediction using two‐dimensional contour plots from drone‐based time series imagery

by Kim, Bitgoeul , Singh, Asheesh K. , Jubery, Talukder Z. in expert opinion , Glycine max , neural networks

2025

Plant breeding programs require assessment and understanding of days to maturity for accurate selection and placement of entries in appropriate tests. Soybean [Glycine max (L.) Merr.] breeding programs, in the early stages of the breeding pipeline, assign relative maturity ratings to experimental varieties that indicate their suitable maturity zones. Traditionally, the estimation of maturity rating value has involved breeders manually inspecting fields and assessing maturity value visually. This approach relies heavily on expert judgment, making it subjective and demanding considerable time and effort. This study aimed to develop a machine learning (ML) model for evaluating soybean maturity using uncrewed aerial system (UAS)–based time series imagery. Images were captured at 3‐day intervals, beginning as the earliest varieties started maturing and continuing until the last varieties fully matured. The data collected for this experiment consisted of 22,043 plots collected across 3 years and represent relative maturity groups 1.6–3.9. We utilized contour plot images extracted from the time series UAS imagery as input for a neural network model. This contour plot approach encoded the temporal and spatial variation within each plot into a single image. A deep learning model was trained to utilize this contour plot to predict maturity ratings. This model demonstrates a significant improvement in accuracy and robustness, achieving up to 85% accuracy. The predictive model offers a scalable, objective, and efficient means of assessing crop maturity, enabling phenomics and ML approaches to reduce the reliance on manual inspection and subjective assessment, thereby saving time and resources in a breeding program. Core Ideas Uncrewed aerial vehicles (UAVs) were used to collect plant breeding traits and support decision‐making in crop improvement programs. Machine learning techniques were applied to plant phenotyping to assess crop maturity. An automated system for soybean maturity classification was developed using aerial imagery combined with deep learning methods, saving time and resources in a variety development program. Plain Language Summary We developed a new method using uncrewed aerial vehicle (UAV), imaging and artificial intelligence (AI) to determine soybean relative maturity. This method can replace traditional manual field inspections, which are time intensive, require substantially more human hours and are subject to inter‐ and intra‐rater variability. We captured UAV based digital images every 3 days during soybean maturation, collecting data from 22,043 plots over 3 years. These plots included varieties with different maturity ratings. We converted the time‐series photos into contour plot images that capture crop change over time, and then trained an AI model. The resulting system achieved 85% accuracy in predicting maturity ratings. Results of our work can assist plant breeders to phenotype breeding plots and make accurate maturity classification assisting in variety development. Our automated approach offers several advantages over manual ratings including savings in time and labor and scalability.

Journal Article

Share this book

Add to My Shelf

Multi‐sensor and multi‐temporal high‐throughput phenotyping for monitoring and early detection of water‐limiting stress in soybean

by Singh, Asheesh K. , Jubery, Talukder Z. , Ganapathysubramanian, Baskar in Accuracy , Agricultural production , Automation

2024

Soybean (Glycine max [L.] Merr.) production is susceptible to biotic and abiotic stresses, exacerbated by extreme weather events. Water limiting stress, that is, drought, emerges as a significant risk for soybean production, underscoring the need for advancements in stress monitoring for crop breeding and production. This project combined multi‐modal information to identify the most effective and efficient automated methods to study drought response. We investigated a set of diverse soybean accessions using multiple sensors in a time series high‐throughput phenotyping manner to: (1) develop a pipeline for rapid classification of soybean drought stress symptoms, and (2) investigate methods for early detection of drought stress. We utilized high‐throughput time‐series phenotyping using unmanned aerial vehicles and sensors in conjunction with machine learning analytics, which offered a swift and efficient means of phenotyping. The visible bands were most effective in classifying the severity of canopy wilting stress after symptom emergence. Non‐visual bands in the near‐infrared region and short‐wave infrared region contribute to the differentiation of susceptible and tolerant soybean accessions prior to visual symptom development. We report pre‐visual detection of soybean wilting using a combination of different vegetation indices and spectral bands, especially in the red‐edge. These results can contribute to early stress detection methodologies and rapid classification of drought responses for breeding and production applications. Core Ideas Sensors, wavebands, and vegetation indices are evaluated for importance in phenotyping canopy wilting in soybean. Aerial versus ground based sensing study shows necessity of balancing speed of data collection and field of view. Random forest classification can be applied to support selection decisions in plant breeding program. Multispectral UAV data enables pre‐visual early detection of canopy wilting drought stress in soybean.

Journal Article

Share this book

Add to My Shelf

Soybean Canopy Stress Classification Using 3D Point Cloud Data

by Singh, Asheesh K. , Chiranjeevi, Shivani , Young, Therin J. in Abiotic stress , Accuracy , agronomy

2024

Automated canopy stress classification for field crops has traditionally relied on single-perspective, two-dimensional (2D) photographs, usually obtained through top-view imaging using unmanned aerial vehicles (UAVs). However, this approach may fail to capture the full extent of plant stress symptoms, which can manifest throughout the canopy. Recent advancements in LiDAR technologies have enabled the acquisition of high-resolution 3D point cloud data for the entire canopy, offering new possibilities for more accurate plant stress identification and rating. This study explores the potential of leveraging 3D point cloud data for improved plant stress assessment. We utilized a dataset of RGB 3D point clouds of 700 soybean plants from a diversity panel exposed to iron deficiency chlorosis (IDC) stress. From this unique set of 700 canopies exhibiting varying levels of IDC, we extracted several representations, including (a) handcrafted IDC symptom-specific features, (b) canopy fingerprints, and (c) latent feature-based features. Subsequently, we trained several classification models to predict plant stress severity using these representations. We exhaustively investigated several stress representations and model combinations for the 3-D data. We also compared the performance of these classification models against similar models that are only trained using the associated top-view 2D RGB image for each plant. Among the feature-model combinations tested, the 3D canopy fingerprint features trained with a support vector machine yielded the best performance, achieving higher classification accuracy than the best-performing model based on 2D data built using convolutional neural networks. Our findings demonstrate the utility of color canopy fingerprinting and underscore the importance of considering 3D data to assess plant stress in agricultural applications.

Journal Article

Share this book

Add to My Shelf

Data driven discovery and quantification of hyperspectral leaf reflectance phenotypes across a maize diversity panel

by Nishimwe, Aime V. , Schnable, James C. , Jubery, Talukder Z. in Chlorophyll , Corn , Data collection

2024

Estimates of plant traits derived from hyperspectral reflectance data have the potential to efficiently substitute for traits, which are time or labor intensive to manually score. Typical workflows for estimating plant traits from hyperspectral reflectance data employ supervised classification models that can require substantial ground truth datasets for training. We explore the potential of an unsupervised approach, autoencoders, to extract meaningful traits from plant hyperspectral reflectance data using measurements of the reflectance of 2151 individual wavelengths of light from the leaves of maize (Zea mays) plants harvested from 1658 field plots in a replicated field trial. A subset of autoencoder‐derived variables exhibited significant repeatability, indicating that a substantial proportion of the total variance in these variables was explained by difference between maize genotypes, while other autoencoder variables appear to capture variation resulting from changes in leaf reflectance between different batches of data collection. Several of the repeatable latent variables were significantly correlated with other traits scored from the same maize field experiment, including one autoencoder‐derived latent variable (LV8) that predicted plant chlorophyll content modestly better than a supervised model trained on the same data. In at least one case, genome‐wide association study hits for variation in autoencoder‐derived variables were proximal to genes with known or plausible links to leaf phenotypes expected to alter hyperspectral reflectance. In aggregate, these results suggest that an unsupervised, autoencoder‐based approach can identify meaningful and genetically controlled variation in high‐dimensional, high‐throughput phenotyping data and link identified variables back to known plant traits of interest. Core Ideas Autoencoder latent variables show stronger correlations with chlorophyll content than principal components. Autoencoder‐derived latent variable exhibits modestly better performance than partial least squares regression supervised model. Latent variables derived from autoencoders are significantly associated with genetic markers. Latent variables capture variance in traits that are transferrable across species and years. Significant proportions of total variance in individual latent variables are attributable to genetics.

Journal Article

Share this book

Add to My Shelf

In silico design of crop ideotypes under a wide range of water availability

by Jubery, Talukder Z. , Ganapathysubramanian, Baskar , Attinger, Daniel in Agriculture , Arid environments , Arid zones

2019

Given the changing climate and increasing impact of agriculture on global resources, it is important to identify phenotypes which are global and sustainable optima. Here, an in silico framework is constructed by coupling evolutionary optimization with thermodynamically sound crop physiology, and its ability to rationally design phenotypes with maximum productivity is demonstrated, within well‐defined limits on water availability. Results reveal that in mesic environments, such as the North American Midwest, and semi‐arid environments, such as Colorado, phenotypes optimized for maximum productivity and survival under drought are similar to those with maximum productivity under irrigated conditions. In hot and dry environments like California, phenotypes adapted to drought produce 40% lower yields when irrigated compared to those optimized for irrigation. In all three representative environments, the trade‐off between productivity under drought versus that under irrigation was shallow, justifying a successful strategy of breeding crops combining best productivity under irrigation and close to best productivity under drought. Given changing climate and increasing impact of agriculture on global resources, it is important to identify phenotypes which are global and sustainable optima. Here, an in silico framework is constructed by coupling evolutionary optimization with thermodynamically sound crop physiology, and its ability to rationally design phenotypes with maximum productivity is demonstrated, within well‐defined limits on water availability. Results reveal that in mesic environments, such as the North American Midwest, and semi‐arid environments, such as Colorado, phenotypes optimized for maximum productivity and for survival under drought are similar to those with maximum productivity under irrigated conditions.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter