Catalogue Search | MBRL

Design of Generative Adversarial Network Super‐Resolution Reconstruction Algorithm for Intelligent Security Images

by Liu, Yuyao , Peng, Qiongyin in CBAM module , generative adversarial network , multi‐scale convolution kernel

2025

In the reconstruction of security images, the current generative adversarial network super‐resolution reconstruction algorithm is prone to generate unrealistic artifacts under high noise and low contrast, and the details of small targets are blurred. This paper adopts an improved Real‐ESRGAN (Real‐Enhanced Super‐Resolution Generative Adversarial Network) and CBAM (Convolutional Block Attention Module) algorithm to perform super‐resolution reconstruction of security images and improve the quality of reconstructed images. The study applies multi‐scale convolution kernels in the RRDB (residual in residual dense block) module of the model and uses convolution kernels of different sizes to extract and fuse image features to comprehensively capture local and overall detail information in the image. Then, based on Real‐ESRGAN, the CBAM module is applied in the generator, and the channel attention and spatial attention mechanisms are used to adaptively focus on multi‐scale features to enhance the modeling ability of texture details in the target area. Finally, a multi‐loss fusion optimization strategy is adopted, and color consistency loss and total variation loss are applied to effectively suppress artifacts and color drift problems in the reconstruction process. The experiment takes the security image in the license plate recognition task as an example to perform super‐resolution reconstruction. The results show that the PSNR (Peak Signal‐to‐Noise Ratio) and SSIM (Structural Similarity) of the improved Real‐ESRGAN‐CBAM are the best, reaching 20.1 dB and 0.635, respectively, under extremely high noise of 5 dB, and still reaching 26.1 dB and SSIM of 0.765 under extremely low contrast of 0.2. The experimental results show that the improved Real‐ESRGAN and CBAM combined algorithm in this paper greatly improves the reconstruction quality of security images, effectively suppresses the generation of artifacts, and can better meet the dual needs of intelligent security systems in image quality and practicality. This paper proposes an improved Real‐ESRGAN algorithm that integrates multi‐scale convolution and CBAM attention mechanism, which significantly improves the super‐resolution reconstruction quality of security images under high noise and low contrast, suppresses artifact generation, and enhances detail recovery ability.

Journal Article

Share this book

Add to My Shelf

MEAC: A Multi-Scale Edge-Aware Convolution Module for Robust Infrared Small-Target Detection

by Zhao, Ming , Hu, Jinlong , Zhang, Tian in Artificial intelligence , attention mechanisms , Comparative analysis

2025

Infrared small-target detection remains a critical challenge in military reconnaissance, environmental monitoring, forest-fire prevention, and search-and-rescue operations, owing to the targets’ extremely small size, sparse texture, low signal-to-noise ratio, and complex background interference. Traditional convolutional neural networks (CNNs) struggle to detect such weak, low-contrast objects due to their limited receptive fields and insufficient feature extraction capabilities. To overcome these limitations, we propose a Multi-Scale Edge-Aware Convolution (MEAC) module that enhances feature representation for small infrared targets without increasing parameter count or computational cost. Specifically, MEAC fuses (1) original local features, (2) multi-scale context captured via dilated convolutions, and (3) high-contrast edge cues derived from differential Gaussian filters. After fusing these branches, channel and spatial attention mechanisms are applied to adaptively emphasize critical regions, further improving feature discrimination. The MEAC module is fully compatible with standard convolutional layers and can be seamlessly embedded into various network architectures. Extensive experiments on three public infrared small-target datasets (SIRSTD-UAVB, IRSTDv1, and IRSTD-1K) demonstrate that networks augmented with MEAC significantly outperform baseline models using standard convolutions. When compared to eleven mainstream convolution modules (ACmix, AKConv, DRConv, DSConv, LSKConv, MixConv, PConv, ODConv, GConv, and Involution), our method consistently achieves the highest detection accuracy and robustness. Experiments conducted across multiple versions, including YOLOv10, YOLOv11, and YOLOv12, as well as various network levels, demonstrate that the MEAC module achieves stable improvements in performance metrics while slightly increasing computational and parameter complexity. These results validate the MEAC module’s significant advantages in enhancing the detection of small and weak objects and suppressing interference from complex backgrounds. These results validate MEAC’s effectiveness in enhancing weak small-target detection and suppressing complex background noise, highlighting its strong generalization ability and practical application potential.

Journal Article

Share this book

Add to My Shelf

A multi-scale convolutional neural network for bearing compound fault diagnosis under various noise conditions

by Liu, ChengLiang , Qin, ChengJin , Jin, YanRui in Accuracy , Artificial neural networks , Background noise

2022

Recently, with the urgent demand for data-driven approaches in practical industrial scenarios, the deep learning diagnosis model in noise environments has attracted increasing attention. However, the existing research has two limitations: (1) the complex and changeable environmental noise, which cannot ensure the high-performance diagnosis of the model in different noise domains and (2) the possibility of multiple faults occurring simultaneously, which brings challenges to the model diagnosis. This paper presents a novel anti-noise multi-scale convolutional neural network (AM-CNN) for solving the issue of compound fault diagnosis under different intensity noises. First, we propose a residual pre-processing block according to the principle of noise superposition to process the input information and present the residual loss to construct a new loss function. Additionally, considering the strong coupling of input information, we design a multi-scale convolution block to realize multi-scale feature extraction for enhancing the proposed model’s robustness and effectiveness. Finally, a multi-label classifier is utilized to simultaneously distinguish multiple bearing faults. The proposed AM-CNN is verified under our collected compound fault dataset. On average, AM-CNN improves 39.93% accuracy and 25.84% Fl-macro under the no-noise working condition and 45.67% accuracy and 27.72% Fl-macro under different intensity noise working conditions compared with the existing methods. Furthermore, the experimental results show that AM-CNN can achieve good cross-domain performance with 100% accuracy and 100% F1-macro. Thus, AM-CNN has the potential to be an accurate and stable fault diagnosis tool.

Journal Article

Share this book

Add to My Shelf

MSLWENet: A Novel Deep Learning Network for Lake Water Body Extraction of Google Remote Sensing Images

by Wang, Zhaobin , Gao, Xiong , Zhang, Yaonan in Accuracy , Algorithms , artificial intelligence

2020

Lake water body extraction from remote sensing images is a key technique for spatial geographic analysis. It plays an important role in the prevention of natural disasters, resource utilization, and water quality monitoring. Inspired by the recent years of research in computer vision on fully convolutional neural networks (FCN), an end-to-end trainable model named the multi-scale lake water extraction network (MSLWENet) is proposed. We use ResNet-101 with depthwise separable convolution as an encoder to obtain the high-level feature information of the input image and design a multi-scale densely connected module to expand the receptive field of feature points by different dilation rates without increasing the computation. In the decoder, the residual convolution is used to abstract the features and fuse the features at different levels, which can obtain the final lake water body extraction map. Through visual interpretation of the experimental results and the calculation of the evaluation indicators, we can see that our model extracts the water bodies of small lakes well and solves the problem of large intra-class variance and small inter-class variance in the lakes’ water bodies. The overall accuracy of our model is up to 98.53% based on the evaluation indicators. Experimental results demonstrate that the MSLWENet, which benefits from the convolutional neural network, is an excellent lake water body extraction network.

Journal Article

Share this book

Add to My Shelf

Hyperspectral Image Classification Using a Multi-Scale CNN Architecture with Asymmetric Convolutions from Small to Large Kernels

by Liao, Xuejiao , Ren, Jinchang , Ge, Linlin in Artificial neural networks , asymmetric convolution , Asymmetry

2025

Deep learning-based hyperspectral image (HSI) classification methods, such as Transformers and Mambas, have attracted considerable attention. However, several challenges persist, e.g., (1) Transformers suffer from quadratic computational complexity due to the self-attention mechanism; and (2) both the local and global feature extraction capabilities of large kernel convolutional neural networks (LKCNNs) need to be enhanced. To address these limitations, we introduce a multi-scale large kernel asymmetric CNN (MSLKACNN) with the large kernel sizes as large as 1×17 and 17×1 for HSI classification. MSLKACNN comprises a spectral feature extraction module (SFEM) and a multi-scale large kernel asymmetric convolution (MSLKAC). Specifically, the SFEM is first utilized to suppress noise, reduce spectral bands, and capture spectral features. Then, MSLKAC, with a large receptive field, joins two parallel multi-scale asymmetric convolution components to extract both local and global spatial features: (C1) a multi-scale large kernel asymmetric depthwise convolution (MLKADC) is designed to capture short-range, middle-range, and long-range spatial features; and (C2) a multi-scale asymmetric dilated depthwise convolution (MADDC) is proposed to aggregate the spatial features between pixels across diverse distances. Extensive experimental results on four widely used HSI datasets show that the proposed MSLKACNN significantly outperforms ten state-of-the-art methods, with overall accuracy (OA) gains ranging from 4.93% to 17.80% on Indian Pines, 2.09% to 15.86% on Botswana, 0.67% to 13.33% on Houston 2013, and 2.20% to 24.33% on LongKou. These results validate the effectiveness of the proposed MSLKACNN.

Journal Article

Share this book

Add to My Shelf

Research into the Applications of a Multi-Scale Feature Fusion Model in the Recognition of Abnormal Human Behavior

by Zhang, Yuting , Li, Congcong , Li, Yifan in abnormal behavior recognition , Accuracy , Algorithms

2024

Due to the increasing severity of aging populations in modern society, the accurate and timely identification of, and responses to, sudden abnormal behaviors of the elderly have become an urgent and important issue. In the current research on computer vision-based abnormal behavior recognition, most algorithms have shown poor generalization and recognition abilities in practical applications, as well as issues with recognizing single actions. To address these problems, an MSCS–DenseNet–LSTM model based on a multi-scale attention mechanism is proposed. This model integrates the MSCS (Multi-Scale Convolutional Structure) module into the initial convolutional layer of the DenseNet model to form a multi-scale convolution structure. It introduces the improved Inception X module into the Dense Block to form an Inception Dense structure, and gradually performs feature fusion through each Dense Block module. The CBAM attention mechanism module is added to the dual-layer LSTM to enhance the model’s generalization ability while ensuring the accurate recognition of abnormal actions. Furthermore, to address the issue of single-action abnormal behavior datasets, the RGB image dataset RIDS (RGB image dataset) and the contour image dataset CIDS (contour image dataset) containing various abnormal behaviors were constructed. The experimental results validate that the proposed MSCS–DenseNet–LSTM model achieved an accuracy, sensitivity, and specificity of 98.80%, 98.75%, and 98.82% on the two datasets, and 98.30%, 98.28%, and 98.38%, respectively.

Journal Article

Share this book

Add to My Shelf

CODON: On Orchestrating Cross-Domain Attentions for Depth Super-Resolution

by Zhang, Jing , Cao Qi , Yang, Yuxiang in Color , Datasets , Domains

2022

The ready accessibility of high-resolution image sensors has stimulated interest in increasing depth resolution by leveraging paired color information as guidance. Nevertheless, how to effectively exploit the depth and color features to achieve a desired depth super-resolution effect remains challenging. In this paper, we propose a novel depth super-resolution method called CODON, which orchestrates cross-domain attentive features to address this problem. Specifically, we devise two essential modules: the recursive multi-scale convolutional module (RMC) and the cross-domain attention conciliation module (CAC). RMC discovers detailed color and depth features by sequentially stacking weight-shared multi-scale convolutional layers, in order to deepen and widen the network at low-complexity. CAC calculates conciliated attention from both domains and uses it as shared guidance to enhance the edges in depth feature while suppressing textures in color feature. Then, the jointly conciliated attentive features are combined and fed into a RMC prediction branch to reconstruct the high-resolution depth image. Extensive experiments on several popular benchmark datasets including Middlebury, New Tsukuba, Sintel, and NYU-V2, demonstrate the superiority of our proposed CODON over representative state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

Multi-scale Convolutional Feature Fusion Network Based on Attention Mechanism for IoT Traffic Classification

by Guan, Jiayu , Liao, Niandong in Artificial Intelligence , Attention mechanism , Computational Intelligence

2024

The Internet of Things (IoT) has been extensively utilized in domains such as smart homes, healthcare, and other industries. With the exponential growth of Internet of Things (IoT) devices, they have become prime targets for malicious cyber-attacks. Effective classification of IoT traffic is, therefore, imperative to enable robust intrusion detection systems. However, IoT traffic data contain intricate spatial relationships and topological information, which traditional methods for traffic identification lack the capability to fully extract features and capture crucial characteristics. We propose a multi-scale convolutional feature fusion network augmented with a Convolutional Block Attention Module (MCF-CBAM) for accurate IoT traffic classification. The network incorporates three critical innovations: (1) Parallel convolution extracts multi-scale spatial features from traffic data. The 1 × 1 convolution operation reduces the amount of parameters and calculations of the network, thereby improving work efficiency. (2) The attention module suppresses less informative features while highlighting the most discriminative ones, enabling focused learning on decisive features. (3) Cross-scale connections with channel jumps reuse features from prior layers to enhance generalization. We evaluate the method extensively on three widely adopted public datasets. Quantitative results demonstrate MCF-CBAM establishes new state-of-the-art performance benchmarks for IoT traffic classification, surpassing existing methods by a significant margin. Qualitative visualizations of the learned attention weights provide intuitive insights into how the network automatically discovers the most decisive spatial features for identification. With its strong empirical performance and interpretable attention mechanisms, this work presents a promising deep learning solution to augment real-world IoT intrusion detection systems against growing cybersecurity threats.

Journal Article

Share this book

Add to My Shelf

Bearing fault diagnosis method based on WSST and ISSA-MCNN-BIGRU

by Bai, Hongwei , Liu, Qiang , Dong, Shien in 639/166/987 , 639/166/988 , Adaptation

2025

Rolling bearings constitute essential components in large-scale rotating machinery, nonetheless, their fault diagnosis still encounters significant challenges, including the difficulty of extracting discriminative features, relatively low recognition rates, and heavy reliance on expert experience. To address these challenges, this paper proposes a hybrid diagnostic framework that integrates the Wavelet Synchrosqueezed Transform (WSST), an Improved Sparrow Search Algorithm (ISSA), a Multi-Scale Convolutional Neural Network (MCNN), and a Bidirectional Gated Recurrent Unit (BiGRU). First, WSST is employed to obtain high-resolution time-frequency representations that capture subtle transient characteristics of bearing vibration signals. Second, MCNN performs multi-scale spatial feature extraction on the WSST-generated images, enabling the simultaneous capture of fine-grained and coarse-grained fault patterns. Third, BiGRU is introduced to learn bidirectional temporal dependencies, thereby enhancing the model’s capability to represent sequential data. Crucially, ISSA—augmented with chaotic Tent mapping, a Gaussian mutation strategy, and a Levy flight mechanism-is applied to adaptively optimize key hyperparameters of the MCNN-BiGRU network (learning rate, convolutional kernel sizes, number of GRU units). Experimental results on both the Case Western Reserve University and Southeast University bearing datasets demonstrate that the proposed ISSA-MCNN-BiGRU model achieves a fault diagnosis accuracy of up to 99.75%, outperforming baseline models such as standalone GRU, BiGRU, MCNN-BiGRU, PSO-MCNN-BiGRU, and GA-MCNN-BiGRU in terms of accuracy, stability, and generalization. Additionally, in different noise environments, the proposed model’s accuracy is significantly higher than that of comparative models, demonstrating strong robustness.

Journal Article

Share this book

Add to My Shelf

MSATNet: multi-scale adaptive transformer network for motor imagery classification

by Hong, Weijie , Hu, Lingyan , Liu, Lingyu in Accuracy , Adaptation , Algorithms

2023

Motor imagery brain-computer interface (MI-BCI) can parse user motor imagery to achieve wheelchair control or motion control for smart prostheses. However, problems of poor feature extraction and low cross-subject performance exist in the model for motor imagery classification tasks. To address these problems, we propose a multi-scale adaptive transformer network (MSATNet) for motor imagery classification. Therein, we design a multi-scale feature extraction (MSFE) module to extract multi-band highly-discriminative features. Through the adaptive temporal transformer (ATT) module, the temporal decoder and multi-head attention unit are used to adaptively extract temporal dependencies. Efficient transfer learning is achieved by fine-tuning target subject data through the subject adapter (SA) module. Within-subject and cross-subject experiments are performed to evaluate the classification performance of the model on the BCI Competition IV 2a and 2b datasets. The MSATNet outperforms benchmark models in classification performance, reaching 81.75 and 89.34% accuracies for the within-subject experiments and 81.33 and 86.23% accuracies for the cross-subject experiments. The experimental results demonstrate that the proposed method can help build a more accurate MI-BCI system.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter