Catalogue Search | MBRL

Multi‐scale feature extraction for energy‐efficient object detection in remote sensing images

by Xie, Fei , Liu, Hongning , Wu, Di in Accuracy , Computer vision , Deep learning

2024

Object detection in remote sensing images aims to interpret images to obtain information on the category and location of potential targets, which is of great importance in traffic detection, marine supervision, and space reconnaissance. However, the complex backgrounds and large scale variations in remote sensing images present significant challenges. Traditional methods relied mainly on image filtering or feature descriptor methods to extract features, resulting in underperformance. Deep learning methods, especially one‐stage detectors, for example, the Real‐Time Object Detector (RTMDet) offers advanced solutions with efficient network architectures. Nevertheless, difficulty in feature extraction from complex backgrounds and target localisation in scale variations images limits detection accuracy. In this paper, an improved detector based on RTMDet, called the Multi‐Scale Feature Extraction‐assist RTMDet (MRTMDet), is proposed which address limitations through enhancement feature extraction and fusion networks. At the core of MRTMDet is a new backbone network MobileViT++ and a feature fusion network SFC‐FPN, which enhances the model's ability to capture global and multi‐scale features by carefully designing a hybrid feature processing unit of CNN and a transformer based on vision transformer (ViT) and poly‐scale convolution (PSConv), respectively. The experiment in DIOR‐R demonstrated that MRTMDet achieves competitive performance of 62.2% mAP, balancing precision with a lightweight design. In this paper, an improved detector MRTMDet, is proposed to overcome the complex backgrounds noise and large scale‐variations challenge for oriented object detection in remote sensing images by designing innovative feature extraction network and feature fusion network. These networks integrate a lightweight vision transformer and a multi‐scale feature extraction module in different structures, thereby enhancing the overall quality of feature representation and the effectiveness in understanding and predicting tasks and further augmenting the model's ability to perceive both global features and multi‐scale features. The authors set the ablation and comparison experiments on the publicly available dataset DIOR‐R which show the model achieves excellent comprehensive performance and is well‐balanced with precision and lightweight.

Journal Article

Share this book

Add to My Shelf

Wavelet‐Based Feature Extraction for Efficient High‐Resolution Image Classification

by Akowuah, Emmanuel Kofi , Acquah, Isaac , Nunoo‐Mensah, Henry in Accuracy , Artificial neural networks , classification

2025

Convolutional neural networks (CNNs) typically compress high‐resolution images to minimize computational requirements. However, this can lead to loss of information and reduced accuracy in classification tasks. This paper introduces WaveNet, a novel approach for processing high‐resolution images using wavelet‐domain inputs in CNNs. We address the challenge of maintaining classification accuracy with high‐resolution inputs while minimizing computational complexity. Our method employs wavelet packet transform (WPT) for image pre‐processing, extracting detailed multi‐scale and directional information from high‐resolution images. We propose a wavelet‐adaptive efficient channel attention (WAECA) module to dynamically select the most informative wavelet subbands. Popular CNN architectures like ResNet‐50 and MobileNetV2 are adapted by replacing their initial convolutional layers with wavelet‐transformed inputs, enabling direct learning in the wavelet domain. Experiments conducted on the Caltech‐256 and ALOT datasets demonstrate that WaveNet improves classification accuracy while reducing computational complexity. For instance, our wavelet‐enhanced ResNet‐50 achieves a Top‐1 accuracy of 72.47% on Caltech‐256, outperforming the baseline (70.65%) while reducing FLOPs from 16.52G to 3.98G. Similar improvements are observed across different architectures and datasets. We also evaluate various wavelet filters and ResNet backbones, finding that the bior1.1 filter and ResNet‐50 provide optimal performance. This work presents a practical solution for developing more accurate and efficient models for high‐resolution inputs without extensive computational resources or complex architectural modifications. The paper introduces a novel CNN architecture that uses wavelet‐domain inputs for high‐resolution image processing. A wavelet‐domain efficient channel attention module that dynamically focuses on informative wavelet subbands is also presented. The proposed architecture improves classification accuracy while reducing computational complexity.

Journal Article

Share this book

Add to My Shelf

LiteMS-YOLO: a lightweight framework for small target detection in complex wheat field environments

by Cheng Peng , Xuefei Wang , Mengying Yang in lightweight object detection , multi-scale feature extraction , small object detection

2026

Wheat spike detection is essential for yield estimation in precision agriculture, yet it remains challenging due to the small size of targets, dense distribution, and complex field environments. In this study, we propose LiteMS-YOLO, a lightweight object detection framework based on YOLO26n. The model integrates a Feature Complementary Mapping (FCM) module to enhance spatial-semantic feature interaction and a Multi-Kernel Perception (MKP) unit to improve multi-scale feature representation. In addition, targeted redundancy reduction strategies are introduced to significantly lower model complexity. Experiments are conducted on a combined dataset comprising the public Global Wheat Head Detection (GWHD) dataset and 100 field images collected by the Tangshan Academy of Agricultural Sciences, with a total of 6,378 high-resolution images and over 44,000 annotated wheat spikes. LiteMS-YOLO achieves a mAP50 of 92.28% and a mAP50–95 of 52.56%, while using only 0.627 million parameters. Compared with YOLO26n and YOLOv8n, the proposed method reduces parameters by approximately 75% and 79%, respectively, while maintaining competitive accuracy. These results demonstrate that LiteMS-YOLO strikes an excellent balance between detection accuracy and efficiency, making it well-suited for real-time deployment in resource-constrained agricultural scenarios.

Journal Article

Share this book

Add to My Shelf

Multi-Scale Attention Network for Building Extraction from High-Resolution Remote Sensing Images

by Qiao, Mengjia , Zhou, Tao , Zhang, Beibei in Accuracy , adaptive weighting , Architecture

2024

The precise building extraction from high-resolution remote sensing images holds significant application for urban planning, resource management, and environmental conservation. In recent years, deep neural networks (DNNs) have garnered substantial attention for their adeptness in learning and extracting features, becoming integral to building extraction methodologies and yielding noteworthy performance outcomes. Nonetheless, prevailing DNN-based models for building extraction often overlook spatial information during the feature extraction phase. Additionally, many existing models employ a simplistic and direct approach in the feature fusion stage, potentially leading to spurious target detection and the amplification of internal noise. To address these concerns, we present a multi-scale attention network (MSANet) tailored for building extraction from high-resolution remote sensing images. In our approach, we initially extracted multi-scale building feature information, leveraging the multi-scale channel attention mechanism and multi-scale spatial attention mechanism. Subsequently, we employed adaptive hierarchical weighting processes on the extracted building features. Concurrently, we introduced a gating mechanism to facilitate the effective fusion of multi-scale features. The efficacy of the proposed MSANet was evaluated using the WHU aerial image dataset and the WHU satellite image dataset. The experimental results demonstrate compelling performance metrics, with the F1 scores registering at 93.76% and 77.64% on the WHU aerial imagery dataset and WHU satellite dataset II, respectively. Furthermore, the intersection over union (IoU) values stood at 88.25% and 63.46%, surpassing benchmarks set by DeepLabV3 and GSMC.

Journal Article

Share this book

Add to My Shelf

Multi-scale quadratic convolutional neural network for bearing fault diagnosis based on multi-sensor data fusion

by Ji, Yingying , Shao, Xing , Wang, Cuixiang in Artificial neural networks , Data integration , Fault diagnosis

2025

Bearing fault diagnosis is crucial for the safe and stable operation of mechanical equipment. However, bearing signals are highly susceptible to noise interference, which complicates feature extraction. Existing multi-source data diagnostic methods still face challenges in effectively integrating signals and suppressing noise. To address these challenges, this paper proposes a multi-sensor data fusion and multi-scale quadratic convolutional neural network for intelligent bearing fault diagnosis. First, the method inputs vibration signals collected by multiple sensors into a time domain filter consisting of a quadratic convolutional network and a frequency domain filter based on a fully connected neural network for processing. The filtered signal is then passed into a multi-scale quadratic convolutional neural network, which utilizes quadratic neurons with strong feature extraction capabilities for bearing vibration signals. The extracted multi-scale features are further refined through a cross attention mechanism to capture more useful information, which is then classified. Experimental results conducted on the bearing datasets from Case Western Reserve University and Politecnico di Torino demonstrate that the proposed method outperforms other comparative models in noisy environments. At a signal-to-noise ratio of -10, the method achieves accuracies of 97.20% and 98.81%, respectively, verifying its excellent performance under complex noise interference conditions.

Journal Article

Share this book

Add to My Shelf

Low-light image enhancement using generative adversarial networks

by Wang, Litian , Zhao, Liquan , Wu, Chunming in 639/705/258 , 704/47 , 704/844

2024

In low-light environments, the amount of light captured by the camera sensor is reduced, resulting in lower image brightness. This makes it difficult to recognize or completely lose details in the image, which affects subsequent processing of low-light images. Low-light image enhancement methods can increase image brightness while better-restoring color and detail information. A generative adversarial network is proposed for low-quality image enhancement to improve the quality of low-light images. This network consists of a generative network and an adversarial network. In the generative network, a multi-scale feature extraction module, which consists of dilated convolutions, regular convolutions, max pooling, and average pooling, is designed. This module can extract low-light image features from multiple scales, thereby obtaining richer feature information. Secondly, an illumination attention module is designed to reduce the interference of redundant features. This module assigns greater weight to important illumination features, enabling the network to extract illumination features more effectively. Finally, an encoder-decoder generative network is designed. It uses the multi-scale feature extraction module, illumination attention module, and other conventional modules to enhance low-light images and improve quality. Regarding the adversarial network, a dual-discriminator structure is designed. This network has a global adversarial network and a local adversarial network. They determine if the input image is actual or generated from global and local features, enhancing the performance of the generator network. Additionally, an improved loss function is proposed by introducing color loss and perceptual loss into the conventional loss function. It can better measure the color loss between the generated image and a normally illuminated image, thus reducing color distortion during the enhancement process. The proposed method, along with other methods, is tested using both synthesized and real low-light images. Experimental results show that, compared to other methods, the images enhanced by the proposed method are closer to normally illuminated images for synthetic low-light images. For real low-light images, the images enhanced by the proposed method retain more details, are more apparent, and exhibit higher performance metrics. Overall, compared to other methods, the proposed method demonstrates better image enhancement capabilities for both synthetic and real low-light images.

Journal Article

Share this book

Add to My Shelf

Remote sensing image Super-resolution reconstruction by fusing multi-scale receptive fields and hybrid transformer

by Li, Songyang , Liu, Denghui , Zhong, Lin in 639/166/984 , 639/705/117 , 639/705/258

2025

To enhance high-frequency perceptual information and texture details in remote sensing images and address the challenges of super-resolution reconstruction algorithms during training, particularly the issue of missing details, this paper proposes an improved remote sensing image super-resolution reconstruction model. The generator network of the model employs multi-scale convolutional kernels to extract image features and utilizes a multi-head self-attention mechanism to dynamically fuse these features, significantly improving the ability to capture both fine details and global information in remote sensing images. Additionally, the model introduces a multi-stage Hybrid Transformer structure, which processes features at different resolutions progressively, from low resolution to high resolution, substantially enhancing reconstruction quality and detail recovery. The discriminator combines multi-scale convolution, global Transformer, and hierarchical feature discriminators, providing a comprehensive and refined evaluation of image quality. Finally, the model incorporates a Charbonnier loss function and total variation (TV) loss function, which significantly improve training stability and accelerate convergence. Experimental results demonstrate that the proposed method, compared to the SRGAN algorithm, achieves average improvements of approximately 3.61 dB in Peak Signal-to-Noise Ratio (PSNR), 0.070 (8.2%) in Structural Similarity Index (SSIM), and 0.030 (3.1%) in Feature Similarity Index (FSIM) across multiple datasets, showing significant performance gains.

Journal Article

Share this book

Add to My Shelf

Hyperspectral Image Spectral–Spatial Classification Method Based on Deep Adaptive Feature Fusion

by Mu, Caihong , Liu, Yijin , Liu, Yi in adaptive feature fusion , data collection , hyperspectral image classification

2021

Convolutional neural networks (CNNs) have been widely used in hyperspectral image (HSI) classification. Many algorithms focus on the deep extraction of a single kind of feature to improve classification. There have been few studies on the deep extraction of two or more kinds of fusion features and the combination of spatial and spectral features for classification. The authors of this paper propose an HSI spectral–spatial classification method based on deep adaptive feature fusion (SSDF). This method first implements the deep adaptive fusion of two hyperspectral features, and then it performs spectral–spatial classification on the fused features. In SSDF, a U-shaped deep network model with the principal component features as the model input and the edge features as the model label is designed to adaptively fuse two kinds of different features. One comprises the edge features of the HSIs extracted by the guided filter, and the other comprises the principal component features obtained by dimensionality reduction of HSIs using principal component analysis. The fused new features are input into a multi-scale and multi-level feature extraction model for further extraction of deep features, which are then combined with the spectral features extracted by the long short-term memory (LSTM) model for classification. The experimental results on three datasets demonstrated that the performance of the proposed SSDF was superior to several state-of-the-art methods. Additionally, SSDF was found to be able to perform best as the number of training samples decreased sharply, and it could also obtain a high classification accuracy for categories with few samples.

Journal Article

Share this book

Add to My Shelf

DMAF-NET: Deep Multi-Scale Attention Fusion Network for Hyperspectral Image Classification with Limited Samples

by Guo, Hufeng , Liu, Wenyi in Accuracy , Classification , convolutional neural network (CNN)

2024

In recent years, deep learning methods have achieved remarkable success in hyperspectral image classification (HSIC), and the utilization of convolutional neural networks (CNNs) has proven to be highly effective. However, there are still several critical issues that need to be addressed in the HSIC task, such as the lack of labeled training samples, which constrains the classification accuracy and generalization ability of CNNs. To address this problem, a deep multi-scale attention fusion network (DMAF-NET) is proposed in this paper. This network is based on multi-scale features and fully exploits the deep features of samples from multiple levels and different perspectives with an aim to enhance HSIC results using limited samples. The innovation of this article is mainly reflected in three aspects: Firstly, a novel baseline network for multi-scale feature extraction is designed with a pyramid structure and densely connected 3D octave convolutional network enabling the extraction of deep-level information from features at different granularities. Secondly, a multi-scale spatial–spectral attention module and a pyramidal multi-scale channel attention module are designed, respectively. This allows modeling of the comprehensive dependencies of coordinates and directions, local and global, in four dimensions. Finally, a multi-attention fusion module is designed to effectively combine feature mappings extracted from multiple branches. Extensive experiments on four popular datasets demonstrate that the proposed method can achieve high classification accuracy even with fewer labeled samples.

Journal Article

Share this book

Add to My Shelf

Gearbox Fault Diagnosis Based on MSCNN-LSTM-CBAM-SE

by Yasenjiang, Jarula , Lv, Luhui , Xu, Lihua in Accuracy , convolutional block attention module , Deep learning

2024

Ensuring the safety of mechanical equipment, gearbox fault diagnosis is crucial for the stable operation of the whole system. However, existing diagnostic methods still have limitations, such as the analysis of single-scale features and insufficient recognition of global temporal dependencies. To address these issues, this article proposes a new method for gearbox fault diagnosis based on MSCNN-LSTM-CBAM-SE. The output of the CBAM-SE module is deeply integrated with the multi-scale features from MSCNN and the temporal features from LSTM, constructing a comprehensive feature representation that provides richer and more precise information for fault diagnosis. The effectiveness of this method has been validated with two sets of gearbox datasets and through ablation studies on this model. Experimental results show that the proposed model achieves excellent performance in terms of accuracy and F1 score, among other metrics. Finally, a comparison with other relevant fault diagnosis methods further verifies the advantages of the proposed model. This research offers a new solution for accurate fault diagnosis of gearboxes.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter