Catalogue Search | MBRL

by Wang, Junqian , Luo, Nan , Xu, Yong in Artificial neural networks , authors , B6135 Optical, image and video signal processing

2019

Owing to the flexible architectures of deep convolutional neural networks (CNNs) are successfully used for image denoising. However, they suffer from the following drawbacks: (i) deep network architecture is very difficult to train. (ii) Deeper networks face the challenge of performance saturation. In this study, the authors propose a novel method called enhanced convolutional neural denoising network (ECNDNet). Specifically, they use residual learning and batch normalisation techniques to address the problem of training difficulties and accelerate the convergence of the network. In addition, dilated convolutions are used in the proposed network to enlarge the context information and reduce the computational cost. Extensive experiments demonstrate that the ECNDNet outperforms the state-of-the-art methods for image denoising.

Journal Article

Share this book

Add to My Shelf

CBAM-Enhanced CNN-LSTM with Improved DBSCAN for High-Precision Radar-Based Gesture Recognition

by Yi, Shiwei , Zhao, Zhenyu , Wu, Tongning in Accuracy , Algorithms , Bandwidths

2026

In recent years, radar-based gesture recognition technology has been widely applied in industrial and daily life scenarios. However, increasingly complex application scenarios have imposed higher demands on the accuracy and robustness of gesture recognition algorithms, and challenges such as clutter interference, inter-gesture similarity, and spatial–temporal feature ambiguity limit recognition performance. To address these challenges, a novel framework named CECL, which incorporates the Convolutional Block Attention Module (CBAM) into a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) architecture, is proposed for high-accuracy radar-based gesture recognition. The CBAM adaptively highlights discriminative spatial regions and suppresses irrelevant background, and the CNN-LSTM network captures temporal dynamics across gesture sequences. During gesture signal processing, the Blackman window is applied to suppress spectral leakage. Additionally, a combination of wavelet thresholding and dynamic energy nulling is employed to effectively suppress clutter and enhance feature representation. Furthermore, an improved Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm further eliminates isolated sparse noise while preserving dense and valid target signal regions. Experimental results demonstrate that the proposed algorithm achieves 98.33% average accuracy in gesture classification, outperforming other baseline models. It exhibits excellent recognition performance across various distances and angles, demonstrating significantly enhanced robustness.

Journal Article

Share this book

Add to My Shelf

AI-driven audio-to-video generation for dynamic content creation via stable diffusion and CNN-augmented transformers

by Kumari, Sneha , Dharrao, Madhuri , Pise, Priya in 4014/477 , 631/477 , 639/705

2026

Translating spoken words into emotionally and contextually aligned video content remains an open challenge in generative AI. Subtle vocal patterns—such as pauses and pitch modulations—often obscure emotional cues, resulting in visuals that feel emotionally disconnected or flat. While several models excel at text-to-image generation, they struggle with interpreting speech-based inputs, often misreading paralinguistic cues and contextual intent. To address these limitations, this research introduces EchoVid, an audio-to-video synthesis model designed to prioritize contextual fidelity and emotional alignment. A scalable web interface built with React.js and TypeScript connects to Node.js backend with the MongoDB Atlas for near-real-time generation (≈ 1.2× input-duration latency at 512² frames on RTX A4000) interaction. Using PyAudio input, EchoVid guides Hugging Face’s Stable Diffusion v2.1 via emotion-aware prompts, with CNN-enhanced diffusion transformers supporting the video generation process. Preliminary results show that EchoVid can generate visuals that reflect both emotional tone (e.g., joyful imagery for upbeat speech) and context. The proposed EchoVid model is compared with MoCoGAN and Stable Video Diffusion variants based on metrics like, FVD, FID-VID and CLIPScore. Further this research introduces two novel evaluation metrics namely Temporal Semantic Stability (TSS) and Perceptual Flicker Index (PFI) that scores the semantic consistency and frame-to-frame change in the generated video. The results show that EchoVid outperforms the other models and can generate relatively better videos.

Journal Article

Share this book

Add to My Shelf

Privacy-preserving federated learning with light-weight attention improved CNNs for automated leukemia detection across distributed medical imaging

by Khan, Nabeel Ahmed , Awan, Muhammad Zeerak , Strakos, Petr in 631/114 , 631/67 , 639/705

2026

This research work describes a lightweight, secure, and interpretable federated learning framework for automatic leukemia classification, which identifies and addresses various problems regarding clinical data security and collaborative model building among partnering healthcare organizations. This framework employs a distributed learning paradigm that allows a number of healthcare facilities to work together to build a high predictive performance classification model while training the model without exchanging sensitive information about patient data, thus ensuring data privacy and methodological reproducibility. The proposed framework employs a lightweight attention-enhanced convolutional neural network (CNN) for the automated classification of leukemia cells to one of the four categories: benign, early, pre-leukemic, and pro-leukemic at only 0.14 s/batch. The global model at 3 clients achieves 95.70% test accuracy while at 5 clients and increased training rounds achieve 96.56% on test set on a weighted aggregation method. Additionally, for increased clinical interpretability and transparency explainable methods are used in this study.

Journal Article

Share this book

Add to My Shelf

An explainable deep learning framework for video violence detection using unsupervised keyframe selection and attention-based CNN

by Azim, Rashid , Alkahtani, Hend Khalid , Qahmash, Ayman in 639/166 , 639/705 , Accuracy

2026

The exponential growth of video data from surveillance and online platforms has heightened the demand for intelligent, explainable systems capable of detecting violence in real time. This study proposes a novel Explainable Attention-Enhanced Convolutional Neural Network (CNN) framework that integrates unsupervised keyframe selection, attention-driven feature learning, and Grad-CAM++-based interpretability to address redundancy, transparency, and generalization challenges in video violence detection. The proposed model automatically extracts representative keyframes using similarity-based clustering, reducing computational overhead while retaining essential temporal information. Attention modules are embedded within the CNN backbone to enhance spatial–temporal feature discrimination, while Grad-CAM + + provides interpretable visual insights into the model’s decision process. Comprehensive experiments on five benchmark datasets—RLVS, Hockey Fight, Violent Flow, ShanghaiTech, and UCF-Crime—demonstrate that the framework achieves superior performance, with an average accuracy of 94.6% and F1-score of 93.9%, outperforming state-of-the-art models such as C3D, I3D, ResNet-LSTM, and ViViT. The model also delivers near-real-time efficiency (≈ 62 FPS) with reduced memory utilization (6.8 GB), confirming its suitability for deployment in surveillance and public safety systems. Statistical analysis using ANOVA and Tukey’s HSD tests verified that keyframe selection and attention modules significantly improve performance ( p < 0.05) with large effect sizes (η² = 0.76). The integration of interpretability further enhances reliability by localizing violence-relevant regions in frames. Overall, the proposed explainable framework establishes a robust, efficient, and transparent solution for automated violence detection in diverse real-world scenarios.

Journal Article

Share this book

Add to My Shelf

Machine Learning-Based Autism Spectrum Disorder Classification Using an Enhanced Convolutional Neural Network Algorithm

by Yugander, P. , Jagannath, M.

2026

Autism Spectrum Disorder (ASD) is a complex neurological developmental disability that appears during early childhood. Conventional ASD diagnostic techniques rely on behavioural observations, characteristics, and clinical interviews. To overcome these limitations, numerous machine learning (ML) and Deep Learning (DL) techniques have been used to assist physicians. For the past three decades, biomedical images have been employed to diagnose neurodevelopmental disorders. The functional Magnetic Resonance (MR) images used in this study. This paper proposes a novel machine learning framework to classify ASD control from healthy controls. The proposed framework consists of two stages. In the first stage, an enhanced Convolutional Neural Network (CNN) is proposed to extract features. In the second stage, the extracted features are given to the machine learning classifiers. The proposed method is tested on the 1112 fMRI images. A total of 539 ASD participants and 573 healthy controls are included in this study. A total of 17 datasets from the ABIDE website are used. These datasets are collected from various international medical laboratories. The proposed framework outperforms the existing methods. The proposed algorithm achieved 92.45% across the entire ABIDE dataset and 98.61% on the individual dataset.

Journal Article

Share this book

Add to My Shelf

Suspicious Actions Detection System Using Enhanced CNN and Surveillance Video

by Balamurugan, Nagaiah Mohanan , Adimoolam, Malaiyalathan , Selvi, Esakky in Alarm systems , Algorithms , Artificial neural networks

2022

Suspicious pre- and post-activity detection in crowded places is essential as many suspicious activities may be carried out by culprits. Usually, there will be installations of surveillance cameras. These surveillance cameras capture videos or images later investigated by authorities and post-event such suspicious activity would be detected. This leads to high human intervention to detect suspicious activity. However, there are no systems available to protect valuable things from such suspicious incidents. Nowadays machine learning (ML)- and deep learning (DL)-based pre-incident warning alarm systems could be adapted to monitor suspicious activity. Suspicious activity prediction would be based on human gestures and unusual activity detection. Even though some methods based on ML or DL have been proposed, the need for a highly accurate, highly precise, low-false-positive and low-false-negative prediction system can be enhanced by hybrid or enhanced ML- or DL-based systems. This proposed research work has introduced an enhanced convolutional neural network (ECNN)-based suspicious activity detection system. The experiment was carried out and the results were claimed. The results are analyzed with the Statistical Package for the Social Sciences (SPSS) tool. The results showed that the mean accuracy, mean precision, mean false-positive rate, and mean false-negative rate of suspicious activity detections were 97.050%, 96.743%, 2.957%, and 2.927% respectively. This result was also compared with the convolutional neural network (CNN) algorithm. This research work can be applied to enhance the pre-suspicious activity alert security system to avoid risky situations.

Journal Article

Share this book

Add to My Shelf

AGBUNet: an enhanced CNN-UNET architecture for the prediction of above ground biomass using deep learning

by Arumai Shiney, S. , Geetha, R. in Artificial Intelligence , Artificial neural networks , Biomass

2025

Accurate prediction of above ground biomass (AGB) is critical for monitoring forest health and carbon cycling. It is crucial for understanding and managing forest ecosystems. In this paper, we propose an enhanced framework combining convolutional neural network (CNN) and UNet, termed AGBUNet, specifically designed for predicting AGB using remote sensing data. The framework consists of separate CNNs for processing each type of image, whose outputs are subsequently fused in a UNet architecture to enhance prediction accuracy. These modifications include customized convolutional layers, advanced preprocessing techniques, and a novel integration of data prompts from separate Sentinel-1 and Sentinel-2 image processing streams. The AGBUNet integrates Sentinel-1 and Sentinel-2 images to leverage complementary information from synthetic aperture radar (SAR) and optical sensors. This study underscores the potential of the AGBUNet model for enhancing biomass estimation from remote sensing data, contributing to better forest management and ecological monitoring. The performance measures obtained are compared with the other models, and the following results are obtained as follows for MSE value of 298.25, RMSE value of 15.27 and MAE value of 12.21, and the values are satisfying compared with earlier benchmarks.

Journal Article

Share this book

Add to My Shelf

A two-domain coordinated sentence similarity scheme for question-answering robots regarding unpredictable outliers and non-orthogonal categories

by Li, Boyang , Li, Jiaxin , Xu, Zhiyu in Accuracy , Algorithms , Coders

2021

It is crucial and challenging for the question-answering robot (Qabot) to match the customer-input questions with the priori identification questions due to highly diversified expressions, especially in the case of Chinese. This article proposes a coordinated scheme to analyze the similarity between sentences in two independent domains instead of a single deep learning model. In the structure domain, the BLEU and data preprocessing are applied for binary analysis to discriminate the unpredictable outliers (illegal questions) to existing library. In the semantics domain, the MC-BERT model, which integrates the BERT encoder and the Multi-kernel convolutional top classifier, is developed to handle the non-orthogonality of class identification questions. The two-domain analyses are in parallel and the two similarity scores are coordinated for the final response. The linguistic features of Chinese are also taken into account. A realistic case of Qabot on energy trading service and finance is numerically studied. Computational results validate the effectiveness and accuracy of the proposed algorithm: Top-1 and Top-3 accuracies are 90.5% and 95.5%, respectively, which are significantly superior to the latest published results.

Journal Article

Share this book

Add to My Shelf

A Dual CNN for Image Super-Resolution

by Song, Jiagang , You, Lei , Xiao, Jingyu in Digital imaging , Feature extraction , Human factors

2022

High-quality images have an important effect on high-level tasks. However, due to human factors and camera hardware, digital devices collect low-resolution images. Deep networks can effectively restore these damaged images via their strong learning abilities. However, most of these networks depended on deeper architectures to enhance clarities of predicted images, where single features cannot deal well with complex screens. In this paper, we propose a dual super-resolution CNN (DSRCNN) to obtain high-quality images. DSRCNN relies on two sub-networks to extract complementary low-frequency features to enhance the learning ability of the SR network. To prevent a long-term dependency problem, a combination of convolutions and residual learning operation is embedded into dual sub-networks. To prevent information loss of an original image, an enhanced block is used to gather original information and obtained high-frequency information of a deeper layer via sub-pixel convolutions. To obtain more high-frequency features, a feature learning block is used to learn more details of high-frequency information. The proposed method is very suitable for complex scenes for image resolution. Experimental results show that the proposed DSRCNN is superior to other popular in SR networks. For instance, our DSRCNN has obtained improvement of 0.08 dB than that of MemNet on Set5 for ×3.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter