Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
192
result(s) for
"lightweight transformer"
Sort by:
LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition
2023
In recent years, the transformer model has been widely used in computer-vision tasks and has achieved impressive results. Unfortunately, these transformer-based models have the common drawback of having many parameters and a large memory footprint, causing them to be difficult to deploy on mobiles as lightweight convolutional neural networks. To address these issues, a Vision Transformer (ViT) model, named the lightweight Vision Transformer (LW-ViT) model, is proposed to reduce the complexity of the transformer-based model. The model is applied to offline handwritten Chinese character recognition. The design of the LW-ViT model is inspired by MobileViT. The lightweight ViT model reduces the number of parameters and FLOPs by reducing the number of transformer blocks and the MV2 layer based on the overall framework of the MobileViT model. The number of parameters and FLOPs for the LW-ViT model was 0.48 million and 0.22 G, respectively, and it ultimately achieved a high recognition accuracy of 95.8% on the dataset. Furthermore, compared to the MobileViT model, the number of parameters was reduced by 53.8%, and the FLOPs were reduced by 18.5%. The experimental results show that the LW-ViT model has a low number of parameters, proving the correctness and feasibility of the proposed model.
Journal Article
MobileRaT: A Lightweight Radio Transformer Method for Automatic Modulation Classification in Drone Communication Systems
2023
Nowadays, automatic modulation classification (AMC) has become a key component of next-generation drone communication systems, which are crucial for improving communication efficiency in non-cooperative environments. The contradiction between the accuracy and efficiency of current methods hinders the practical application of AMC in drone communication systems. In this paper, we propose a real-time AMC method based on the lightweight mobile radio transformer (MobileRaT). The constructed radio transformer is trained iteratively, accompanied by pruning redundant weights based on information entropy, so it can learn robust modulation knowledge from multimodal signal representations for the AMC task. To the best of our knowledge, this is the first attempt in which the pruning technique and a lightweight transformer model are integrated and applied to processing temporal signals, ensuring AMC accuracy while also improving its inference efficiency. Finally, the experimental results—by comparing MobileRaT with a series of state-of-the-art methods based on two public datasets—have verified its superiority. Two models, MobileRaT-A and MobileRaT-B, were used to process RadioML 2018.01A and RadioML 2016.10A to achieve average AMC accuracies of 65.9% and 62.3% and the highest AMC accuracies of 98.4% and 99.2% at +18 dB and +14 dB, respectively. Ablation studies were conducted to demonstrate the robustness of MobileRaT to hyper-parameters and signal representations. All the experimental results indicate the adaptability of MobileRaT to communication conditions and that MobileRaT can be deployed on the receivers of drones to achieve air-to-air and air-to-ground cognitive communication in less demanding communication scenarios.
Journal Article
A lightweight transformer based multi task learning model with dynamic weight allocation for improved vulnerability prediction
2025
Accurate vulnerability prediction is crucial for identifying potential security risks in software, especially in the context of imbalanced and complex real-world datasets. Traditional methods, such as single-task learning and ensemble approaches, often struggle with these challenges, particularly in detecting rare but critical vulnerabilities. To address this, we propose the MTLPT: Multi-Task Learning with Position Encoding and Lightweight Transformer for Vulnerability Prediction, a novel multi-task learning framework that leverages custom lightweight Transformer blocks and position encoding layers to effectively capture long-range dependencies and complex patterns in source code. The MTLPT model improves sensitivity to rare vulnerabilities and incorporates a dynamic weight loss function to adjust for imbalanced data. Our experiments on real-world vulnerability datasets demonstrate that MTLPT outperforms traditional methods in key performance metrics such as recall, F1-score, AUC, and MCC. Ablation studies further validate the contributions of the lightweight Transformer blocks, position encoding layers, and dynamic weight loss function, confirming their role in enhancing the model's predictive accuracy and efficiency.
Journal Article
A lightweight transformer framework for open set anomaly segmentation in smart city applications
2025
Open-set anomaly segmentation task in diverse infrastructure faces substantial challenges due to its computational overhead and accuracy measures. Although the existing transformer-based methods are efficient, that are limited in the factors of Computational efficiency and accuracy trade-offs. This paper presents LightMask, a lightweight transformer-based architecture designed for efficient, context-aware segmentation of anomalous regions in complex urban environments. The proposed framework has five key contributions: optimized EfficientNet-B0 backbone, adaptive inference mechanism, separable self-attention (SSA) with linear complexity, progressive multi-scale decoder with dynamic early termination, and boundary-aware contrastive loss for open-set anomaly segmentation tasks. LightMask focuses on the lightweight framework with computational efficiency first, while preserving the performance of anomaly detection. The evaluation results showcase that LightMask produces lower parameter count of 4.29 million (16.35 MB) ensures the lightweight structure and a computational efficiency with only 8.72 GFLOPs. For training and evaluation, the Cityscapes and RoadAnomaly datasets were used and the finding reveals the model robustness with 91.79% precision, 93% recall, 77.66% F1 score, 88.28% AUC-ROC, and a low false positive rate of 36.24% at 95% TPR. Based on these findings LightMask balances computational costs with robust anomaly detection capabilities.
Journal Article
TE-TransReID: Towards Efficient Person Re-Identification via Local Feature Embedding and Lightweight Transformer
2025
Person re-identification aims to match images of the same individual across non-overlapping cameras by analyzing personal characteristics. Recently, Transformer-based models have demonstrated excellent capabilities and achieved breakthrough progress in this task. However, their high computational costs and inadequate capacity to capture fine-grained local features impose significant constraints on re-identification performance. To address these challenges, this paper proposes a novel Toward Efficient Transformer-based Person Re-identification (TE-TransReID) framework. Specifically, the proposed framework retains only the former L-th layer layers of a pretrained Vision Transformer (ViT) for global feature extraction while combining local features extracted from a pretrained CNN, thus achieving the trade-off between high accuracy and lightweight networks. Additionally, we propose a dual efficient feature-fusion strategy to integrate global and local features for accurate person re-identification. The Efficient Token-based Feature-Fusion Module (ETFFM) employs the gate-based network to learn fused token-wise features, while the Efficient Patch-based Feature-Fusion Module (EPFFM) utilizes a lightweight Transformer to aggregate patch-level features. Finally, TE-TransReID achieves a rank-1 of 94.8%, 88.3%, and 85.7% on Market1501, DukeMTMC, and MSMT17 with a parameter of 27.5 M, respectively. Compared to existing CNN–Transformer hybrid models, TE-TransReID maintains comparable recognition accuracy while drastically reducing model parameters, establishing an optimal equilibrium between recognition accuracy and computational efficiency.
Journal Article
Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification
by
Huang, Xinyan
,
Chen, Puhua
,
Li, Pengfang
in
Accuracy
,
Artificial intelligence
,
Artificial neural networks
2023
Remote sensing (RS) scene classification has received considerable attention due to its wide applications in the RS community. Many methods based on convolutional neural networks (CNNs) have been proposed to classify complex RS scenes, but they cannot fully capture the context in RS images because of the lack of long-range dependencies (the dependency relationship between two distant elements). Recently, some researchers fine-tuned the large pretrained vision transformer (ViT) on small RS datasets to extract long-range dependencies effectively in RS scenes. However, it usually takes more time to fine-tune the ViT on account of high computational complexity. The lack of good local feature representation in the ViT limits classification performance improvement. To this end, we propose a lightweight transformer network (LTNet) for RS scene classification. First, a multi-level group convolution (MLGC) module is presented. It enriches the diversity of local features and requires a lower computational cost by co-representing multi-level and multi-group features in a single module. Then, based on the MLGC module, a lightweight transformer block, LightFormer, was designed to capture global dependencies with fewer computing resources. Finally, the LTNet was built using the MLGC and LightFormer. The experiments of fine-tuning the LTNet on four RS scene classification datasets demonstrate that the proposed network achieves a competitive classification performance under less training time.
Journal Article
Efficient Transformer Architectures for Diabetic Retinopathy Classification from Fundus Images: DR-MobileViT, DR-EfficientFormer, and DR-SwinTiny
2025
Diabetic retinopathy (DR) is a prevalent cause of vision loss, necessitating efficient diagnostic tools, particularly in resource-limited settings. This study presents three lightweight transformer-based models— DR-MobileViT, DR-EfficientFormer, and DR-SwinTiny—for automated DR classification from fundus images (APTOS 2019: 3,662 images; Messidor-2: 1,748 images). After preprocessing including resizing to 224×224 pixels and CLAHE enhancement, these models, leveraging compact architectures (1.8–3.5M parameters), are trained using an AdamW optimizer with data augmentation. DR-MobileViT integrates convolutional and transformer layers, DR-EfficientFormer employs a dimension-consistent design, and DRSwinTiny utilizes shifted window attention. All models were initialized with ImageNet pretrained weights. Evaluated on the APTOS 2019 and Messidor-2 datasets, they achieve quadratic weighted kappa (QWK) scores up to 0.89 and areas under the ROC curve (AUC) up to 0.95. These models approach the performance of top-performing CNN ensembles from the APTOS 2019 challenge (which exceed 40M parameters) while reducing inference times to 10–15 ms/image (NVIDIA P100 GPU) and computational overhead by over 90%. These results indicate their potential for scalable, point-of-care DR screening, offering a viable solution for early detection in underserved regions.
Journal Article
Exploring the Ideas of Integrating the Teaching of Information Technology and English Translation
2024
The intelligent use of digital teaching resources and the deep penetration of information technology can help promote the change and development of English teaching and improve the sharing, interactivity, interestingness and effectiveness of English translation teaching resources. This paper develops an intelligent classroom teaching model for English translation that combines scaffolding and multimodal discourse teaching using the U Campus teaching platform. To create an intelligent learning environment for English translation, this paper uses distributed crawler technology to crawl English teaching resources and Minhash algorithm to de-weight and optimize the data, constructs an English translation corpus, and introduces a lightweight Transformer network to establish a machine translation model. This paper quantitatively analyzes the teaching effect of the English translation smart classroom teaching model using data from University C students as the research object. The results of the study show that the students in class A1 scored 95.02 points in topic comprehension, the translation theory and skills improved by 8.46 points compared with the students in class A2, and there was a significant difference between the students’ cross-cultural communication skills at the 1% level. By integrating the information technology platform with English translation teaching, students’ translation skills can be enhanced, they can learn more English resources, and their intercultural knowledge and competence can be improved.
Journal Article
Slight Aware Enhancement Transformer and Multiple Matching Network for Real-Time UAV Tracking
2023
Based on the versatility and effectiveness of the siamese neural network, the technology of unmanned aerial vehicle visual object tracking has found widespread application in various fields including military reconnaissance, intelligent transportation, and visual positioning. However, due to complex factors, such as occlusions, viewpoint changes, and interference from similar objects during UAV tracking, most existing siamese neural network trackers struggle to combine superior performance with efficiency. To tackle this challenge, this paper proposes a novel SiamSTM tracker that is based on Slight Aware Enhancement Transformer and Multiple matching networks for real-time UAV tracking. The SiamSTM leverages lightweight transformers to encode robust target appearance features while using the Multiple matching networks to fully perceive response map information and enhance the tracker’s ability to distinguish between the target and background. The results are impressive: evaluation results based on three UAV tracking benchmarks showed superior speed and precision. Moreover, SiamSTM achieves over 35 FPS on NVIDIA Jetson AGX Xavier, which satisfies the real-time requirements in engineering.
Journal Article
Pear Fruit Detection Model in Natural Environment Based on Lightweight Transformer Architecture
2025
Aiming at the problems of low precision, slow speed and difficult detection of small target pear fruit in a real environment, this paper designs a pear fruit detection model in a natural environment based on a lightweight Transformer architecture based on the RT-DETR model. Meanwhile, Xinli No. 7 fruit data set with different environmental conditions is established. First, based on the original model, the backbone was replaced with a lightweight FasterNet network. Secondly, HiLo, an improved and efficient attention mechanism with high and low-frequency information extraction, was used to make the model lightweight and improve the feature extraction ability of Xinli No. 7 in complex environments. The CCFM module is reconstructed based on the Slim-Neck method, and the loss function of the original model is replaced with the Shape-NWD small target detection mechanism loss function to enhance the feature extraction capability of the network. The comparison test between RT-DETR and YOLOv5m, YOLOv7, YOLOv8m and YOLOv10m, Deformable-DETR models shows that RT-DETR can achieve a good balance in terms of model lightweight and recognition accuracy compared with other models, and comprehensively exceed the detection accuracy of the current advanced YOLOv10 algorithm, which can realize the rapid detection of Xinli No. 7 fruit. In this paper, the accuracy rate, recall rate and average accuracy of the improved model reached 93.7%, 91.9% and 98%, respectively, and compared with the original model, the number of params, calculation amount and weight memory was reduced by 48.47%, 56.2% and 48.31%, respectively. This model provides technical support for Xinli No. 7 fruit detection and model deployment in complex environments.
Journal Article