Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
3,980
result(s) for
"attention module"
Sort by:
RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images
2022
Classification of land use and land cover from remote sensing images has been widely used in natural resources and urban information management. The variability and complex background of land use in high-resolution imagery poses greater challenges for remote sensing semantic segmentation. To obtain multi-scale semantic information and improve the classification accuracy of land-use types in remote sensing images, the deep learning models have been wildly focused on. Inspired by the idea of the atrous-spatial pyramid pooling (ASPP) framework, an improved deep learning model named RAANet (Residual ASPP with Attention Net) is constructed in this paper, which constructed a new residual ASPP by embedding the attention module and residual structure into the ASPP. There are 5 dilated attention convolution units and a residual unit in its encoder. The former is used to obtain important semantic information at more scales, and residual units are used to reduce the complexity of the network to prevent the disappearance of gradients. In practical applications, according to the characteristics of the data set, the attention unit can select different attention modules such as the convolutional block attention model (CBAM). The experimental results obtained from the land-cover domain adaptive semantic segmentation (LoveDA) and ISPRS Vaihingen datasets showed that this model can enhance the classification accuracy of semantic segmentation compared to the current deep learning models.
Journal Article
A lightweight large receptive field network LrfSR for image super-resolution
2025
Deep convolutional neural networks have demonstrated excellent performance in the field of single-image super-resolution (SISR) reconstruction. However, existing methods often suffer from issues such as large number of parameters, intensive computation, and high latency, which limit the application of deep convolutional neural networks on devices with low computational resources. To solve these problems, this paper proposes a lightweight large receptive field network for image super-resolution (LrfSR). The innovations of this paper mainly include the following aspects. Firstly, we design an information distillation module based on large receptive field (LrfDM). The module achieves large receptive field by dilated convolution, and the enlarged receptive field facilitates the network to capture more pixel-to-pixel relationships and fuse multi-scale information in the feature distillation stage. This design effectively extracts the high-frequency features of the image, which can be demonstrated by the feature map. Secondly, a more efficient attention mechanism is introduced into the network, designed as ECCA and SESA, respectively, which achieves an improvement in super-resolution image quality with fewer network parameters. Experiments on Set5, Set14, B100, Urban100 and Manga109 datasets show that the LrfSR model achieves 4-fold super-resolution Peak Signal-to-Noise Ratio (PSNR) values of 32.23 dB, 28.65 dB, 27.59 dB, 26.36 dB and 30.53 dB, which is better than the existing model LKDN etc. Meanwhile, both qualitative and quantitative experimental results show that the LrfSR model explores the potential of large receptive fields in lightweight image super-resolution networks and successfully achieves a balance between high-quality image reconstruction and limited resources. The code and models are available at
https://github.com/wanqin557/LrfSR
.
Journal Article
Fine-grained image classification method based on hybrid attention module
by
Yang, Ying
,
Yang, Lei
,
Lu, Weixiang
in
attention erasure module
,
channel attention module
,
fine-grained image classification
2024
To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network’s less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200–2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.
Journal Article
Sketch recognition model based on improved CycleGAN network and dual attention mechanism
To improve sketch recognition accuracy, this study proposes an enhanced sketch recognition model based on an improved CycleGAN network and a dual attention mechanism. The proposed model first incorporates multi-directional convolution and brightness equalization modules into the CycleGAN network to extract edge and contour features. A dual attention mechanism is then implemented using channel attention and spatial attention modules, effectively addressing issues of sparse strokes and uneven spatial distribution in sketches while enhancing the representation of critical features. Finally, a hybrid architecture combining global average pooling and convolution layers serves as the classifier to produce sketch recognition results. Simulation results demonstrate that this model achieves 97.08% accuracy, 98.12% precision, 98.23% recall, and 97.45% F1 score for sketch recognition on the TU-Berlin dataset, and 98.65% accuracy, 98.12% precision, 98.76% recall, and 97.95% F1 score on the QuickDraw dataset. Compared with state-of-the-art sketch recognition models, this model exhibits superior performance in accuracy. These results indicate that the model can enhance sketch recognition precision and provide technical support for converting sketches into high-quality animated images.
Journal Article
Channel-spatial attention modules in convolutional neural networks for image classification
2025
Many studies have established that the attention mechanism has great potential in improving the performance of Convolutional Neural Networks (CNNs) in image classification problems in recent years. Combining channel and spatial attention modules is one of the different kinds of attention mechanisms that are inspired by the visual perception of the human brain. So far, no paper has considered both parallel and sequential states of combining channel-spatial attention modules, so that while comparing them comprehensively and accurately, it can be definitively said which of them is more optimal in terms of a better balance between efficiency and computational complexity of the model. In this paper, we introduced two new types of channel-spatial attention modules, the Parallel Channel-spatial Attention Module (PCSAM) and the Sequential Channel-spatial Attention Module (SCSAM), to embed in the architecture of any CNN. Each of the proposed attention modules is composed of a channel and spatial attention sub-modules. The Channel Attention Module (CAM) and Spatial Attention Module (SAM) help the network in extracting the channels related to the architecture of the Region of Interest (RoI) and its location in the input feature maps, respectively. We increase the representation power of the attention-based networks by extracting the features using Global Average Pooling (GAP) and Global Maximum Pooling (GMP) in the CAM and SAM. Also, the Dilation Convolution (DC) layer is employed in the structure of the SAM instead of the standard convolution to better focus on the RoI in the feature maps. The PCSAM and SCSAM are implemented in the architecture of the ResNet18 and MobileNetv4 to produce the ResNet18PCSAM, ResNet18SCSAM, MobileNetv4PCSAM, and MobileNetv4SCSAM. All networks are trained and evaluated on three general image classification datasets, the CIFAR-10, CIFAR-100, and Tiny-ImageNet, with the same experimental conditions for 50 epochs. The classification results in the test step show that the MobileNetv4SCSAM has a better efficiency than other architectures on all datasets. It also achieved higher performance than the previous existing channel-spatial attention modules.
Journal Article
SACA-fusion: a low-light fusion architecture of infrared and visible images based on self- and cross-attention
2024
Visible-infrared image fusion cannot only reveal respective features of multiband imaging but also combine complementary information. It thus highlights salient information that cannot be directly obtained from a single waveband and enhances scene detection and perception. However, low-light condition for special scenarios, i.e., underground coal mine, impacts the performance of visible-infrared image fusion as they lead to lower contrast for visible light images and loss of local details. In this respect, we propose an infrared and visible image fusion architecture in low-light conditions based on self- and cross-attention (SACA-Fusion). This architecture replaces traditional fusion approaches with a transformer-based fusion network. It better extracts long-range dependencies of images and improves space recovery of fused images. The architecture has an attention mechanism composed of two modules. The self-attention module achieves global interaction and fusion of features and reduces loss in local details; the cross-attention module in nest connect enhances features in low-light conditions and achieves low-contrast space recovery. In the experiment part, through ablation, we confirm that the wonderful fusion strategy is transformer module, rather than RFN or directly connecting. Then, based on comparison experiments on TNO and LLVIP datasets, it is shown that the better fusion performance of the proposed one under some evaluation indicators. Especially in the actual low-light condition, the improvement of the fusion effect is commendable.
Journal Article
MFATNet: Multi-Scale Feature Aggregation via Transformer for Remote Sensing Image Change Detection
by
Tong, Xinyu
,
Mao, Zan
,
Luo, Ze
in
Agglomeration
,
Artificial neural networks
,
Change detection
2022
In recent years, with the extensive application of deep learning in images, the task of remote sensing image change detection has witnessed a significant improvement. Several excellent methods based on Convolutional Neural Networks and emerging transformer-based methods have achieved impressive accuracy. However, Convolutional Neural Network-based approaches have difficulties in capturing long-range dependencies because of their natural limitations in effective receptive field acquisition unless deeper networks are employed, introducing other drawbacks such as an increased number of parameters and loss of shallow information. The transformer-based methods can effectively learn the relationship between different regions, but the computation is inefficient. Thus, in this paper, a multi-scale feature aggregation via transformer (MFATNet) is proposed for remote sensing image change detection. To obtain a more accurate change map after learning the intra-relationships of feature maps at different scales through the transformer, MFATNet aggregates the multi-scale features. Moreover, the Spatial Semantic Tokenizer (SST) is introduced to obtain refined semantic tokens before feeding into the transformer structure to make it focused on learning more crucial pixel relationships. To fuse low-level features (more fine-grained localization information) and high-level features (more accurate semantic information), and to alleviate the localization and semantic gap between high and low features, the Intra- and Inter-class Channel Attention Module (IICAM) are integrated to further determine more convincing change maps. Extensive experiments are conducted on LEVIR-CD, WHU-CD, and DSIFN-CD datasets. Intersection over union (IoU) of 82.42 and F1 score of 90.36, intersection over union (IoU) of 79.08 and F1 score of 88.31, intersection over union (IoU) of 77.98 and F1 score of 87.62, respectively, are achieved. The experimental results achieved promising performance compared to certain previous state-of-the-art change detection methods.
Journal Article
Hyperspectral Image Classification with the Orthogonal Self-Attention ResNet and Two-Step Support Vector Machine
by
Sun, Heting
,
Liu, Haitao
,
Wang, Liguo
in
Algorithms
,
Artificial intelligence
,
channel attention module
2024
Hyperspectral image classification plays a crucial role in remote sensing image analysis by classifying pixels. However, the existing methods require more spatial–global information interaction and feature extraction capabilities. To overcome these challenges, this paper proposes a novel model for hyperspectral image classification using an orthogonal self-attention ResNet and a two-step support vector machine (OSANet-TSSVM). The OSANet-TSSVM model comprises two essential components: a deep feature extraction network and an improved support vector machine (SVM) classification module. The deep feature extraction network incorporates an orthogonal self-attention module (OSM) and a channel attention module (CAM) to enhance the spatial–spectral feature extraction. The OSM focuses on computing 2D self-attention weights for the orthogonal dimensions of an image, resulting in a reduced number of parameters while capturing comprehensive global contextual information. In contrast, the CAM independently learns attention weights along the channel dimension. The CAM autonomously learns attention weights along the channel dimension, enabling the deep network to emphasise crucial channel information and enhance the spectral feature extraction capability. In addition to the feature extraction network, the OSANet-TSSVM model leverages an improved SVM classification module known as the two-step support vector machine (TSSVM) model. This module preserves the discriminative outcomes of the first-level SVM subclassifier and remaps them as new features for the TSSVM training. By integrating the results of the two classifiers, the deficiencies of the individual classifiers were effectively compensated, resulting in significantly enhanced classification accuracy. The performance of the proposed OSANet-TSSVM model was thoroughly evaluated using public datasets. The experimental results demonstrated that the model performed well in both subjective and objective evaluation metrics. The superiority of this model highlights its potential for advancing hyperspectral image classification in remote sensing applications.
Journal Article
An efficient epileptic seizure detection by classifying focal and non-focal EEG signals using optimized deep dual adaptive CNN-HMM classifier
by
Desai, Sharmishta
,
Chavan, Puja A.
in
Algorithms
,
Classifiers
,
Computer Communication Networks
2024
Seizures are defined as short occurrences of unusual elevated brain electrical activity that can result in a variety of symptoms and actions where Seizures are the main sign of epilepsy. Due to the unexpected character of seizures and the individual variances in symptoms, examining individuals who are experiencing epileptic seizures could pose some difficulties. Recent researches have very low accuracies in epileptic seizure detection so in order to solve these above issues a detection model is developed that helps the health care sector. In this research, an improved deep dual adaptive CNN-HMM classifier is developed to detect the epileptic seizures automatically with focal and non-focal epileptic EEG signals. The inputs are collected from the four datasets and preprocessing is performed for converting unstructured data into structured data. The preprocessed signal is divided into five separate sub-bands and subjected to wavelet decomposition to decrease noise. The Human learning optimization (HLO) algorithm is proposed to perform the electrode selection process to identify the best electrode and also helps to reduce the overfitting problem. Once the signals are decided optimally, the features extraction takes place through three steps such as TQWT, Hjorth and statistical features are preferred for analyzing the EEG signals to derive the deep analysis of the data. The seizure detection is done using the deep dual adaptive CNN-HMM classifier, which helps in the efficient detection of epileptic seizure. The accuracy, sensitivity, specificity, precision and f-measure of the deep dual adaptive CNN-HMM classifier's outputs are evaluated. For dataset 1, attains 99.46%, 98.48%, 99.46%, 99.90%, and 99.58% with TP, 98.13%, 98.46%, 97.56%, 99.88%, and 99.56% with tenfold. For dataset 2, attains 94.53%, 92.37%, 99.94%, 93.11% and 93.60% with TP, 90.84%, 91.17%, 90.27%, 93.09% and 93.58% with tenfold. Similarly, for dataset 3 attains 94.48%, 94.62%, 96.82%, 95.41%, and 96.40% with TP, 94.54%, 94.68%, 96.87%, 95.46% and 96.45% with tenfold. For dataset 4, attains 99.13%, 98.72%, 98.00%, 96.73% and 97.72% with TP, 99.28%, 99.32%, 99.22%, 98.85% and 98.92% with tenfold, which is more efficient than other existing methods.
Journal Article
Attention module-based fused deep cnn for learning disabilities identification using EEG signal
by
Ahire, Nitin Kisan
,
Awale, R. N.
,
Wagh, Abhay
in
Computer Communication Networks
,
Computer Science
,
Data Structures and Information Theory
2024
Learning disabilities (LDs) are analyzed in children whose educational capabilities of understanding, inscription, or arithmetic are harmed as well as lagging under their age, schooling, as well as cleverness, which have a predictable occurrence among the percentage from 5 to 9 in the pediatric population. Preceding research regarding electroencephalography (EEG) signal havestated a delay in the growth of alpha-band at precise phenotypes of LD, which appears to provide a feasible clarification for differences in the maturation of EEG. Thus, EEG signals of children with reading disorders (RDs) are depicted through an advanced theta as well as a lesser alpha than those of characteristically raising children. Thus, an attention module based fused Deep CNN is developed for identifying learning disabilities using EEG signals. The main point of the proposed research is to predict the learning disabilities of children depending on EEG. The accuracy of the proposed method attains the values of 97.60% and 95.12% in terms of training percentage as well as k-fold.
Journal Article