Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
23
result(s) for
"Fan, Xijian"
Sort by:
MTCDNet: Multimodal Feature Fusion-Based Tree Crown Detection Network Using UAV-Acquired Optical Imagery and LiDAR Data
2025
Accurate detection of individual tree crowns is a critical prerequisite for precisely extracting forest structural parameters, which is vital for forestry resources monitoring. While unmanned aerial vehicle (UAV)-acquired RGB imagery, combined with deep learning-based networks, has demonstrated considerable potential, existing methods often rely exclusively on RGB data, rendering them susceptible to shadows caused by varying illumination and suboptimal performance in dense forest stands. In this paper, we propose integrating LiDAR-derived Canopy Height Model (CHM) with RGB imagery as complementary cues, shifting the paradigm of tree crown detection from unimodal to multimodal. To fully leverage the complementary properties of RGB and CHM, we present a novel Multimodal learning-based Tree Crown Detection Network (MTCDNet). Specifically, a transformer-based multimodal feature fusion strategy is proposed to adaptively learn correlations among multilevel features from diverse modalities, which enhances the model’s ability to represent tree crown structures by leveraging complementary information. In addition, a learnable positional encoding scheme is introduced to facilitate the fused features in capturing the complex, densely distributed tree crown structures by explicitly incorporating spatial information. A hybrid loss function is further designed to enhance the model’s capability in handling occluded crowns and crowns of varying sizes. Experiments conducted on two challenging datasets with diverse stand structures demonstrate that MTCDNet significantly outperforms existing state-of-the-art single-modality methods, achieving AP50 scores of 93.12% and 94.58%, respectively. Ablation studies further confirm the superior performance of the proposed fusion network compared to simple fusion strategies. This research indicates that effectively integrating RGB and CHM data offers a robust solution for enhancing individual tree crown detection.
Journal Article
Semi-supervised Learning for Weed and Crop Segmentation Using UAV Imagery
2022
Weed control has received great attention due to its significant influence on crop yield and food production. Accurate mapping of crop and weed is a prerequisite for the development of an automatic weed management system. In this paper, we propose a weed and crop segmentation method, SemiWeedNet, to accurately identify the weed with varying size in complex environment, where semi-supervised learning is employed to reduce the requirement of a large amount of labelled data. SemiWeedNet takes the labelled and unlabelled images into account when generating a unified semi-supervised architecture based on semantic segmentation model. A multiscale enhancement module is created by integrating the encoded feature with the selective kernel attention, to highlight the significant features of the weed and crop while alleviating the influence of complex background. To address the problem caused by the similarity and overlapping between crop and weed, an online hard example mining (OHEM) is introduced to refine the labelled data training. This forces the model to focus more on pixels that are not easily distinguished, and thus effectively improve the image segmentation. To further exploit the meaningful information of unlabelled data, consistency regularisation is introduced by maintaining the context consistency during training, making the representations robust to the varying environment. Comparative experiments are conducted on a publicly available dataset. The results show the SemiWeedNet outperforms the state-of-the-art methods, and its components have promising potential in improving segmentation.
Journal Article
Parameter-Efficient Fine-Tuning for Individual Tree Crown Detection and Species Classification Using UAV-Acquired Imagery
2025
Pre-trained foundation models, trained on large-scale datasets, have demonstrated significant success in a variety of downstream vision tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt these foundation models to new domains by updating only a small subset of parameters, thereby reducing computational overhead. However, the effectiveness of these PEFT methods, especially in the context of forestry remote sensing—specifically for individual tree detection—remains largely unexplored. In this work, we present a simple and efficient PEFT approach designed to transfer pre-trained transformer models to the specific tasks of tree crown detection and species classification in unmanned aerial vehicle (UAV) imagery. To address the challenge of mitigating the influence of irrelevant ground targets in UAV imagery, we propose an Adaptive Salient Channel Selection (ASCS) method, which can be simply integrated into each transformer block during fine-tuning. In the proposed ASCS, task-specific channels are adaptively selected based on class-wise importance scores, where the channels most relevant to the target class are highlighted. In addition, a simple bias term is introduced to facilitate the learning of task-specific knowledge, enhancing the adaptation of the pre-trained model to the target tasks. The experimental results demonstrate that the proposed ASCS fine-tuning method, which utilizes a small number of task-specific learnable parameters, significantly outperforms the latest YOLO detection framework and surpasses the state-of-the-art PEFT method in tree detection and classification tasks. These findings demonstrate that the proposed ASCS is an effective PEFT method, capable of adapting the pre-trained model’s capabilities for tree crown detection and species classification using UAV imagery.
Journal Article
Discriminative attention-augmented feature learning for facial expression recognition in the wild
by
Tjahjadi, Tardi
,
Zhou, Linyi
,
Das Choudhury, Sruti
in
Artificial Intelligence
,
Artificial neural networks
,
Computational Biology/Bioinformatics
2022
Facial expression recognition (FER) in-the-wild is challenging due to unconstraint settings such as varying head poses, illumination, and occlusions. In addition, the performance of a FER system significantly degrades due to large intra-class variation and inter-class similarity of facial expressions in real-world scenarios. To mitigate these problems, we propose a novel approach, Discriminative Attention-augmented Feature Learning Convolution Neural Network (DAF-CNN), which learns discriminative expression-related representations for FER. Firstly, we develop a 3D attention mechanism for feature refinement which selectively focuses on attentive channel entries and salient spatial regions of a convolution neural network feature map. Moreover, a deep metric loss termed Triplet-Center (TC) loss is incorporated to further enhance the discriminative power of the deeply-learned features with an expression-similarity constraint. It simultaneously minimizes intra-class distance and maximizes inter-class distance to learn both compact and separate features. Extensive experiments have been conducted on two representative facial expression datasets (FER-2013 and SFEW 2.0) to demonstrate that DAF-CNN effectively captures discriminative feature representations and achieves competitive or even superior FER performance compared to state-of-the-art FER methods.
Journal Article
A Segmentation-Guided Deep Learning Framework for Leaf Counting
by
Zhou, Rui
,
Tjahjadi, Tardi
,
Das Choudhury, Sruti
in
Computer vision
,
deep CNN architecture
,
Deep learning
2022
Deep learning-based methods have recently provided a means to rapidly and effectively extract various plant traits due to their powerful ability to depict a plant image across a variety of species and growth conditions. In this study, we focus on dealing with two fundamental tasks in plant phenotyping, i.e., plant segmentation and leaf counting, and propose a two-steam deep learning framework for segmenting plants and counting leaves with various size and shape from two-dimensional plant images. In the first stream, a multi-scale segmentation model using spatial pyramid is developed to extract leaves with different size and shape, where the fine-grained details of leaves are captured using deep feature extractor. In the second stream, a regression counting model is proposed to estimate the number of leaves without any pre-detection, where an auxiliary binary mask from segmentation stream is introduced to enhance the counting performance by effectively alleviating the influence of complex background. Extensive pot experiments are conducted CVPPP 2017 Leaf Counting Challenge dataset, which contains images of Arabidopsis and tobacco plants. The experimental results demonstrate that the proposed framework achieves a promising performance both in plant segmentation and leaf counting, providing a reference for the automatic analysis of plant phenotypes.
Journal Article
Cross-Modal Feature Fusion for Field Weed Mapping Using RGB and Near-Infrared Imagery
2024
The accurate mapping of weeds in agricultural fields is essential for effective weed control and enhanced crop productivity. Moving beyond the limitations of RGB imagery alone, this study presents a cross-modal feature fusion network (CMFNet) designed for precise weed mapping by integrating RGB and near-infrared (NIR) imagery. CMFNet first applies color space enhancement and adaptive histogram equalization to improve the image brightness and contrast in both RGB and NIR images. Building on a Transformer-based segmentation framework, a cross-modal multi-scale feature enhancement module is then introduced, featuring spatial and channel feature interaction to automatically capture complementary information across two modalities. The enhanced features are further fused and refined by integrating an attention mechanism, which reduces the background interference and enhances the segmentation accuracy. Extensive experiments conducted on two public datasets, the Sugar Beets 2016 and Sunflower datasets, demonstrate that CMFNet significantly outperforms CNN-based segmentation models in the task of weed and crop segmentation. The model achieved an Intersection over Union (IoU) metric of 90.86% and 90.77%, along with a Mean Accuracy (mAcc) of 93.8% and 94.35%, respectively. Ablation studies further validate that the proposed cross-modal fusion method provides substantial improvements over basic feature fusion methods, effectively localizing weed and crop regions across diverse field conditions. These findings underscore their potential as a robust solution for precise and adaptive weed mapping in complex agricultural landscapes.
Journal Article
Semi-FCMNet: Semi-Supervised Learning for Forest Cover Mapping from Satellite Imagery via Ensemble Self-Training and Perturbation
2023
Forest cover mapping is of paramount importance for environmental monitoring, biodiversity assessment, and forest resource management. In the realm of forest cover mapping, significant advancements have been made by leveraging fully supervised semantic segmentation models. However, the process of acquiring a substantial quantity of pixel-level labelled data is prone to time-consuming and labour-intensive procedures. To address this issue, this paper proposes a novel semi-supervised-learning-based semantic segmentation framework that leverages limited labelled and numerous unlabelled data, integrating multi-level perturbations and model ensembles. Our framework incorporates a multi-level perturbation module that integrates input-level, feature-level, and model-level perturbations. This module aids in effectively emphasising salient features from remote sensing (RS) images during different training stages and facilitates the stability of model learning, thereby effectively preventing overfitting. We also propose an ensemble-voting-based label generation strategy that enhances the reliability of model-generated labels, achieving smooth label predictions for challenging boundary regions. Additionally, we designed an adaptive loss function that dynamically adjusts the focus on poorly learned categories and dynamically adapts the attention towards labels generated during both the student and teacher stages. The proposed framework was comprehensively evaluated using two satellite RS datasets, showcasing its competitive performance in semi-supervised forest-cover-mapping scenarios. Notably, the method outperforms the fully supervised approach by 1–3% across diverse partitions, as quantified by metrics including mIoU, accuracy, and mPrecision. Furthermore, it exhibits superiority over other state-of-the-art semi-supervised methods. These results indicate the practical significance of our solution in various domains, including environmental monitoring, forest management, and conservation decision-making processes.
Journal Article
PlantNet: transfer learning-based fine-grained network for high-throughput plants recognition
by
Tjahjadi, Tardi
,
Yang, Ziying
,
He, Wenyan
in
Accuracy
,
Artificial Intelligence
,
Artificial neural networks
2022
In high-throughput phenotyping, recognizing individual plant categories is a vital support process for plant breeding. However, different plant categories have different fine-grained characteristics, i.e., intra-class variation and inter-class similarity, making the process challenging. Existing deep learning-based recognition methods fail to effectively address this recognition task under challenging requirements, leading to technical difficulties such as low accuracy and lack of generalization robustness. To address these requirements, this paper proposes PlantNet, a fine-grained network for plant recognition based on transfer learning and a bilinear convolutional neural network, which achieves high recognition accuracy in high-throughput phenotyping requirements. The network operates as follows. First, two deep feature extractors are constructed using transfer learning. The outer product of the different spatial locations corresponding to the two features is then calculated, and the bilinear convergence is computed for the different spatial locations. Finally, the fused bilinear vectors are normalized via maximum expectation to generate the network output. Experiments on a publicly available Arabidopsis dataset show that the proposed bilinear model performed better than related state-of-the-art methods. The interclass recognition accuracy of the four different species of Arabidopsis Sf-2, Cvi, Landsberg and Columbia are found to be 98.48%, 96.53%, 96.79% and 97.33%, respectively, with an average accuracy of 97.25%. Thus, the network has good generalization ability and robust performance, satisfying the needs of fine-grained plant recognition in agricultural production.
Journal Article
Real-Time Detection of Smoke and Fire in the Wild Using Unmanned Aerial Vehicle Remote Sensing Imagery
2025
Detecting wildfires and smoke is essential for safeguarding forest ecosystems and offers critical information for the early evaluation and prevention of such incidents. The advancement of unmanned aerial vehicle (UAV) remote sensing has further enhanced the detection of wildfires and smoke, which enables rapid and accurate identification. This paper presents an integrated one-stage object detection framework designed for the simultaneous identification of wildfires and smoke in UAV imagery. By leveraging mixed data augmentation techniques, the framework enriches the dataset with small targets to enhance its detection performance for small wildfires and smoke targets. A novel backbone enhancement strategy, integrating region convolution and feature refinement modules, is developed to facilitate the ability to localize smoke features with high transparency within complex backgrounds. By integrating the shape aware loss function, the proposed framework enables the effective capture of irregularly shaped smoke and fire targets with complex edges, facilitating the accurate identification and localization of wildfires and smoke. Experiments conducted on a UAV remote sensing dataset demonstrate that the proposed framework achieves a promising detection performance in terms of both accuracy and speed. The proposed framework attains a mean Average Precision (mAP) of 79.28%, an F1 score of 76.14%, and a processing speed of 8.98 frames per second (FPS). These results reflect increases of 4.27%, 1.96%, and 0.16 FPS compared to the YOLOv10 model. Ablation studies further validate that the incorporation of mixed data augmentation, feature refinement models, and shape aware loss results in substantial improvements over the YOLOv10 model. The findings highlight the framework’s capability to rapidly and effectively identify wildfires and smoke using UAV imagery, thereby providing a valuable foundation for proactive forest fire prevention measures.
Journal Article
Robust blood pressure estimation using an RGB camera
by
Choudhury, Sruti Das
,
Ye, Qiaolin
,
Fan, Xijian
in
Algorithms
,
Artificial Intelligence
,
Blood pressure
2020
Blood pressure (BP) is one of important vital signs in diagnosing certain cardiovascular diseases such as hypertension. A few studies have shown that BP can be estimated by pulse transit time (PTT) derived by calculating the time difference between two photoplethysmography (PPG) measurements, which requires a set of body-worn sensors attached to the skin. Recently, remote photoplethysmography (rPPG) has been proposed as an alternative to contactless monitoring. In this paper, we propose a novel contactless framework to estimate BP based on PTT. We develop an algorithm to adaptively select reliable local rPPG pairs, which can remove the rPPG pairs having poor quality. To further improve the PTT estimation, an adaptive Gaussian model is developed to refine the shape of rPPG by analyzing the essential characteristics of rPPG. The adjusted PTT is computed from the refined rPPG signal to estimate BP. The proposed framework is validated using the video sequences captured by an RGB camera, with the ground truth BP measured using a BP monitor. Experiments on the videos collected in laboratory have shown that the proposed framework is capable of estimating BP, with a statistically compliance compared with BP monitor.
Journal Article