Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
249 result(s) for "multi-scale fusion module"
Sort by:
Cloud Removal in the Tibetan Plateau Region Based on Self-Attention and Local-Attention Models
Optical remote sensing images have a wide range of applications but are often affected by cloud cover, which interferes with subsequent analysis. Therefore, cloud removal has become indispensable in remote sensing data processing. The Tibetan Plateau, as a sensitive region to climate change, plays a crucial role in the East Asian water cycle and regional climate due to its snow cover. However, the rich ice and snow resources, rapid snow condition changes, and active atmospheric convection in the plateau as well as its surrounding mountainous areas, make optical remote sensing prone to cloud interference. This is particularly significant when monitoring snow cover changes, where cloud removal becomes essential considering the complex terrain and unique snow characteristics of the Tibetan Plateau. This paper proposes a novel Multi-Scale Attention-based Cloud Removal Model (MATT). The model integrates global and local information by incorporating multi-scale attention mechanisms and local interaction modules, enhancing the contextual semantic relationships and improving the robustness of feature representation. To improve the segmentation accuracy of cloud- and snow-covered regions, a cloud mask is introduced in the local-attention module, combined with the local interaction module to modulate and reconstruct fine-grained details. This enables the simultaneous representation of both fine-grained and coarse-grained features at the same level. With the help of multi-scale fusion modules and selective attention modules, MATT demonstrates excellent performance on both the Sen2_MTC_New and XZ_Sen2_Dataset datasets. Particularly on the XZ_Sen2_Dataset, it achieves outstanding results: PSNR = 29.095, SSIM = 0.897, FID = 125.328, and LPIPS = 0.356. The model shows strong cloud removal capabilities in cloud- and snow-covered areas in mountainous regions while effectively preserving snow information, and providing significant support for snow cover change studies.
CASM-AMFMNet: A Network Based on Coordinate Attention Shuffle Mechanism and Asymmetric Multi-Scale Fusion Module for Classification of Grape Leaf Diseases
Grape disease is a significant contributory factor to the decline in grape yield, typically affecting the leaves first. Efficient identification of grape leaf diseases remains a critical unmet need. To mitigate background interference in grape leaf feature extraction and improve the ability to extract small disease spots, by combining the characteristic features of grape leaf diseases, we developed a novel method for disease recognition and classification in this study. First, Gaussian filters Sobel smooth de-noising Laplace operator (GSSL) was employed to reduce image noise and enhance the texture of grape leaves. A novel network designated coordinated attention shuffle mechanism-asymmetric multi-scale fusion module net (CASM-AMFMNet) was subsequently applied for grape leaf disease identification. CoAtNet was employed as the network backbone to improve model learning and generalization capabilities, which alleviated the problem of gradient explosion to a certain extent. The CASM-AMFMNet was further utilized to capture and target grape leaf disease areas, therefore reducing background interference. Finally, Asymmetric multi-scale fusion module (AMFM) was employed to extract multi-scale features from small disease spots on grape leaves for accurate identification of small target diseases. The experimental results based on our self-made grape leaf image dataset showed that, compared to existing methods, CASM-AMFMNet achieved an accuracy of 95.95%, F1 score of 95.78%, and mAP of 90.27%. Overall, the model and methods proposed in this report could successfully identify different diseases of grape leaves and provide a feasible scheme for deep learning to correctly recognize grape diseases during agricultural production that may be used as a reference for other crops diseases.
Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.
Joint manipulation trace attention network and adaptive fusion mechanism for image splicing forgery localization
Splicing forgery, which manipulates images by copying regions from donor images and pasting them to host images, is one of the common types of image forgery in life, where the copied regions include object regions or background regions. In order to accurately detect these forgery regions, the most mainstream approach is to use an encoder-decoder network architecture that extracts enough manipulation traces to determine whether each pixel of the input image has been spliced or not. However, due to the limited receptive field of such networks, only local manipulation traces can be learned, and therefore some large object area forgery and background forgery cannot be well localized. To address these issues, in this paper, an end-to-end splicing detection framework is proposed, which includes localization network L-Net, manipulation traces attention network MTA-Net, and adaptive multi-scale fusion module. The localization network L-Net is designed as an encoder-decoder network to extract local manipulation traces for each pixel and implement localization of splicing areas. MTA-Net uses the proposed content-remove convolutional layer (CRCL) to suppress image content information that would hinder the network from learning to manipulate traces, and then uses subsequent convolutional layers to extract features to discriminate whether the input image is a spliced image or not. In this process, the regions in the feature map of the convolutional layers with large activation values are the ones that contain global manipulation traces. These global manipulation traces are fused with the local manipulation traces learned by L-Net through the proposed adaptive multi-scale fusion module (AMSFM), thus allowing L-Net to effectively handle object forgery and background region forgery images of various sizes. Ablation experiments showed an increase of 4.6 % and 3.9 % in F1-score and MCC after the introduction of MTA-Net and AMSFM, respectively The splicing region detection performance on three standard datasets, CASIA, COLUMB, and CARVALHO, shows that the proposed method outperforms the state-of-the-art methods for both object forgery and background forgery, and is more robust to post-processing methods such as JPEG compression and noise addition.
Water Body Extraction in Remote Sensing Imagery Using Domain Adaptation-Based Network Embedding Selective Self-Attention and Multi-Scale Feature Fusion
A water body is a common object in remote sensing images and high-quality water body extraction is important for some further applications. With the development of deep learning (DL) in recent years, semantic segmentation technology based on deep convolution neural network (DCNN) brings a new way for automatic and high-quality body extraction from remote sensing images. Although several methods have been proposed, there exist two major problems in water body extraction, especially for high resolution remote sensing images. One is that it is difficult to effectively detect both large and small water bodies simultaneously and accurately predict the edge position of water bodies with DCNN-based methods, and the other is that DL methods need a large number of labeled samples which are often insufficient in practical application. In this paper, a novel SFnet-DA network based on the domain adaptation (DA) embedding selective self-attention (SSA) mechanism and multi-scale feature fusion (MFF) module is proposed to deal with these problems. Specially, the SSA mechanism is used to increase or decrease the space detail and semantic information, respectively, in the bottom-up branches of the network by selective feature enhancement, thus it can improve the detection capability of water bodies with drastic scale change and can prevent the prediction from being affected by other factors, such as roads and green algae. Furthermore, the MFF module is used to accurately acquire edge information by changing the number of the channel of advanced feature branches with a unique fusion method. To skip the labeling work, SFnet-DA reduces the difference in feature distribution between labeled and unlabeled datasets by building an adversarial relationship between the feature extractor and the domain classifier, so that the trained parameters of the labeled datasets can be directly used to predict the unlabeled images. Experimental results demonstrate that the proposed SFnet-DA has better performance on water body segmentation than state-of-the-art methods.
CenterLoc3D: monocular 3D vehicle localization network for roadside surveillance cameras
Monocular 3D vehicle localization is an important task for vehicle behaviour analysis, traffic flow parameter estimation and autonomous driving in Intelligent Transportation System (ITS) and Cooperative Vehicle Infrastructure System (CVIS), which is usually achieved by monocular 3D vehicle detection. However, monocular cameras cannot obtain depth information directly due to the inherent imaging mechanism, resulting in more challenging monocular 3D tasks. Currently, most of the monocular 3D vehicle detection methods still rely on 2D detectors and additional geometric constraint modules to recover 3D vehicle information, which reduces the efficiency. At the same time, most of the research is based on datasets of onboard scenes, instead of roadside perspective, which is limited in large-scale 3D perception. Therefore, we focus on 3D vehicle detection without 2D detectors in roadside scenes. We propose a 3D vehicle localization network CenterLoc3D for roadside monocular cameras, which directly predicts centroid and eight vertexes in image space, and the dimension of 3D bounding boxes without 2D detectors. To improve the precision of 3D vehicle localization, we propose a multi-scale weighted-fusion module and a loss with spatial constraints embedded in CenterLoc3D. Firstly, the transformation matrix between 2D image space and 3D world space is solved by camera calibration. Secondly, vehicle type, centroid, eight vertexes, and the dimension of 3D vehicle bounding boxes are obtained by CenterLoc3D. Finally, centroid in 3D world space can be obtained by camera calibration and CenterLoc3D for 3D vehicle localization. To the best of our knowledge, this is the first application of 3D vehicle localization for roadside monocular cameras. Hence, we also propose a benchmark for this application including a dataset (SVLD-3D), an annotation tool (LabelImg-3D), and evaluation metrics. Through experimental validation, the proposed method achieves high accuracy with A P 3 D of 51.30%, average 3D localization precision of 98%, average 3D dimension precision of 85% and real-time performance with FPS of 41.18.
A Road Crack Segmentation Method Based on Transformer and Multi-Scale Feature Fusion
To ensure the safety of vehicle travel, the maintenance of road infrastructure has become increasingly critical, with efficient and accurate detection techniques for road cracks emerging as a key research focus in the industry. The development of deep learning technologies has shown tremendous potential in improving the efficiency of road crack detection. While convolutional neural networks have proven effective in most semantic segmentation tasks, overcoming their limitations in road crack segmentation remains a challenge. To address this, this paper proposes a novel road crack segmentation network that leverages the powerful spatial feature modeling capabilities of Swin Transformer and the Encoder–Decoder architecture of DeepLabv3+. Additionally, the incorporation of a multi-scale coding module and attention mechanism enhances the network’s ability to densely fuse multi-scale features and expand the receptive field, thereby improving the integration of information from feature maps. Performance comparisons with current mainstream semantic segmentation models on crack datasets demonstrate that the proposed model achieves the best results, with an MIoU of 81.06%, Precision of 79.95%, and F1-score of 77.56%. The experimental results further highlight the model’s superior ability in identifying complex and irregular cracks and extracting contours, providing guidance for future applications in this field.
AsymUNet: An Efficient Multi-Layer Perceptron Model Based on Asymmetric U-Net for Medical Image Noise Removal
With the continuous advancement of deep learning technology, U-Net–based algorithms for image denoising play a crucial role in medical image processing. However, most U-Net-based medical image denoising algorithms typically have large parameter sizes, which poses significant limitations in practical applications where computational resources are limited or large-scale patient data processing are required. In this paper, we propose a medical image denoising algorithm called AsymUNet, developed using an asymmetric U-Net framework and a spatially rearranged multilayer perceptron (MLP). AsymUNet utilizes an asymmetric U-Net to reduce the computational burden, while a multiscale feature fusion module enhances the feature interaction between the encoder and decoder. To better preserve the image details, spatially rearranged MLP blocks serve as the core building blocks of AsymUNet. These blocks effectively extract both the local and global features of the image, reducing the model’s reliance on prior knowledge of the image and further accelerating the training and inference processes. Experimental results demonstrate that AsymUNet achieves superior performance metrics and visual results compared with other state-of-the-art methods.
Remaining Useful Life Prediction for Aero-Engines Based on Multi-Scale Dilated Fusion Attention Model
To address the limitations of CNNs and RNNs in handling complex operating conditions, multi-scale degradation patterns, and long-term dependencies—with attention mechanisms often failing to highlight key degradation features—this paper proposes a remaining useful life (RUL) prediction framework based on a multi-scale dilated fusion attention (MDFA) module. The MDFA leverages parallel dilated convolutions with varying dilation rates to expand receptive fields, while a global-pooling branch captures sequence-level degradation trends. Additionally, integrated channel and spatial attention mechanisms enhance the model’s ability to emphasize informative features and suppress noise, thereby improving overall prediction robustness. The proposed method is evaluated on NASA’s C-MAPSS and N-CMAPSS datasets, achieving MAE values of 0.018–0.026, RMSE values of 0.021–0.032, and R2 scores above 0.987, demonstrating superior accuracy and stability compared to existing baselines. Furthermore, to verify generalization across domains, experiments on the PHM2012 bearing dataset show similar performance (MAE: 0.023–0.026, RMSE: 0.031–0.032, R2: 0.987–0.995), confirming the model’s effectiveness under diverse operating conditions and its adaptability to different degradation behaviors. This study provides a practical and interpretable deep-learning solution for RUL prediction, with broad applicability to aero-engine prognostics and other industrial health-monitoring tasks.
Remote Sensing Image Change Detection Based on Deep Multi-Scale Multi-Attention Siamese Transformer Network
Change detection is a technique that can observe changes in the surface of the earth dynamically. It is one of the most significant tasks in remote sensing image processing. In the past few years, with the ability of extracting rich deep image features, the deep learning techniques have gained popularity in the field of change detection. In order to obtain obvious image change information, the attention mechanism is added in the decoder and output stage in many deep learning-based methods. Many of these approaches neglect to upgrade the ability of the encoders and the feature extractors to extract the representational features. To resolve this problem, this study proposes a deep multi-scale multi-attention siamese transformer network. A special contextual attention module combining a convolution and self-attention module is introduced into the siamese feature extractor to enhance the global representation ability. A lightly efficient channel attention block is added in the siamese feature extractor to obtain the information interaction among different channels. Furthermore, a multi-scale feature fusion module is proposed to fuse the features from different stages of the siamese feature extractor, and it can detect objects of different sizes and irregularities. To increase the accuracy of the proposed approach, the transformer module is utilized to model the long-range context in two-phase images. The experimental results on the LEVIR-CD and the CCD datasets show the effectiveness of the proposed network.