Catalogue Search | MBRL

AMFF-net: adaptive multi-modal feature fusion network for image classification

by Liu, Wei , Lu, Xiaobo , Wei, Yun in Artificial neural networks , Computer Communication Networks , Computer Science

2024

Convolutional neural networks(CNNs) have been applied to different computer vision tasks such as image classification and recognition, object detection, and segmentation due to the excellent capability of feature extraction and strong generalization ability in recent years. However, CNNs mainly represent the semantic information of images by aggregating local features. It is proved that some global features, such as histograms of oriented gradients, color information, and local binary pattern features, are useful for image recognition. Nonetheless, some researchers simply concatenate these features together, overlooking the differences between features, which leads to the inability to obtain desired performance or even worse results. To better integrate multi-modal features, in this paper a novel feature fusion module is proposed, named AMFF Network, which can adaptively fuse CNNs’ local-global features and traditional global features. That’s to say, the high-level semantic characteristic of objects and the low-level detailed information and appearance features can be combined dynamically by this network. It is convenient to embed the network in various architectures and can generalize effectively in various datasets. Furtherly, we show that the AMFF module brings obvious performance improvements for current state-of-the-art methods at some additional calculation cost. Experiments performed on multiple benchmark datasets, such as Fashion-MNIST, CIFAR10, CIFAR100, Tiny-Imagenet-200, and Market1501, demonstrate that the proposed AMFF-Net module can bring significant promotion in different datasets for image classification.

Journal Article

Share this book

Add to My Shelf

Unsupervised Deep-Embedding Global Feature Descriptor for Image Retrieval

by He, Qiaoping in Circuits and Systems , Deep learning , Electrical Engineering

2024

Image representations based on deep learning models can provide exciting performance for image retrieval, but only using deep learning models cannot exploit global topological properties appropriately. The topological perception theory claims that the visual perception process is from global to local : global topological perception occurs earlier than other local patterns. Simulating the visual perception mechanism together with deep learning models to provide a compact yet discriminative representation remains challenging. Toward this end, we propose a novel image representation method called deep-embedding global feature descriptor . The main highlights include: (1) A frequency statistics ranking method is proposed to yield global topology features by combining global visual features and deep convolutional features. (2) An embedding method is proposed to embed the global topology feature spatially and channel-wise into deep convolutional features. It can reasonably integrate the global topological characteristic with local patterns by simulating the visual perceptual process from global to local . (3) A compact yet discriminative representation is provided by leveraging the advantages of global visual and deep features. Exhaustive experiments on five well-known benchmark datasets show that the proposed method outperforms some recent unsupervised state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion

by Li, Hui , Xiao, Yongbiao , Song, Xiaoning in Algorithms , Deep learning , detail features

2023

The infrared and visible image fusion task aims to generate a single image that preserves complementary features and reduces redundant information from different modalities. Although convolutional neural networks (CNNs) can effectively extract local features and obtain better fusion performance, the size of the receptive field limits its feature extraction ability. Thus, the Transformer architecture has gradually become mainstream to extract global features. However, current Transformer-based fusion methods ignore the enhancement of details, which is important to image fusion tasks and other downstream vision tasks. To this end, a new super feature attention mechanism and the wavelet-guided pooling operation are applied to the fusion network to form a novel fusion network, termed SFPFusion. Specifically, super feature attention is able to establish long-range dependencies of images and to fully extract global features. The extracted global features are processed by wavelet-guided pooling to fully extract multi-scale base information and to enhance the detail features. With the powerful representation ability, only simple fusion strategies are utilized to achieve better fusion performance. The superiority of our method compared with other state-of-the-art methods is demonstrated in qualitative and quantitative experiments on multiple image fusion benchmarks.

Journal Article

Share this book

Add to My Shelf

Groundwater Prediction Using Machine-Learning Tools

by Hussein, Eslam A. , Bagula, Antoine , Ghaziasgar, Mehrdad in Creeks & streams , Datasets , Deep learning

2020

Predicting groundwater availability is important to water sustainability and drought mitigation. Machine-learning tools have the potential to improve groundwater prediction, thus enabling resource planners to: (1) anticipate water quality in unsampled areas or depth zones; (2) design targeted monitoring programs; (3) inform groundwater protection strategies; and (4) evaluate the sustainability of groundwater sources of drinking water. This paper proposes a machine-learning approach to groundwater prediction with the following characteristics: (i) the use of a regression-based approach to predict full groundwater images based on sequences of monthly groundwater maps; (ii) strategic automatic feature selection (both local and global features) using extreme gradient boosting; and (iii) the use of a multiplicity of machine-learning techniques (extreme gradient boosting, multivariate linear regression, random forests, multilayer perceptron and support vector regression). Of these techniques, support vector regression consistently performed best in terms of minimizing root mean square error and mean absolute error. Furthermore, including a global feature obtained from a Gaussian Mixture Model produced models with lower error than the best which could be obtained with local geographical features.

Journal Article

Share this book

Add to My Shelf

A Review: Point Cloud-Based 3D Human Joints Estimation

by Yue, Yang , Xu, Tianxu , Jia, Yuetong in Cloud Computing , computer vision , Computers

2021

Joint estimation of the human body is suitable for many fields such as human–computer interaction, autonomous driving, video analysis and virtual reality. Although many depth-based researches have been classified and generalized in previous review or survey papers, the point cloud-based pose estimation of human body is still difficult due to the disorder and rotation invariance of the point cloud. In this review, we summarize the recent development on the point cloud-based pose estimation of the human body. The existing works are divided into three categories based on their working principles, including template-based method, feature-based method and machine learning-based method. Especially, the significant works are highlighted with a detailed introduction to analyze their characteristics and limitations. The widely used datasets in the field are summarized, and quantitative comparisons are provided for the representative methods. Moreover, this review helps further understand the pertinent applications in many frontier research directions. Finally, we conclude the challenges involved and problems to be solved in future researches.

Journal Article

Share this book

Add to My Shelf

LHC constraints on a B − L gauge model using Contur

by Liu, W. , Deppisch, F. F. , Amrith, S. in Beyond Standard Model , Classical and Quantum Gravitation , Constraint modelling

2019

A bstract The large and growing library of measurements from the Large Hadron Collider has significant power to constrain extensions of the Standard Model. We consider such constraints on a well-motivated model involving a gauged and spontaneously-broken B − L symmetry, within the C ontur framework. The model contains an extra Higgs boson, a gauge boson, and right-handed neutrinos with Majorana masses. This new particle content implies a varied phenomenology highly dependent on the parameters of the model, very well-suited to a general study of this kind. We find that existing LHC measurements significantly constrain the model in interesting regions of parameter space. Other regions remain open, some of which are within reach of future LHC data.

Journal Article

Share this book

Add to My Shelf

A fine-grained human facial key feature extraction and fusion method for emotion recognition

by Wang, Jisen , Huang, Yan , Wang, Jianqiang in 639/705/1042 , 639/705/1046 , 639/705/258

2025

Emotion, a fundamental mapping of human responses to external stimuli, has been extensively studied in human–computer interaction, particularly in areas such as intelligent cockpits and systems. However, accurately recognizing emotions from facial expressions remains a significant challenge due to lighting conditions, posture, and micro-expressions. Emotion recognition using global or local facial features is a key research direction. However, relying solely on global or local features often results in models that exhibit uneven attention across facial features, neglecting key variations critical for detecting emotional changes. This paper proposes a method for modeling and extracting key facial features by integrating global and local facial data. First, we construct a comprehensive image preprocessing model that includes super-resolution processing, lighting and shading processing, and texture enhancement. This preprocessing step significantly enriches the expression of image features. Second, A global facial feature recognition model is developed using an encoder-decoder architecture, which effectively eliminates environmental noise and generates a comprehensive global feature dataset for facial analysis. Simultaneously, the Haar cascade classifier is employed to extract refined features from key facial regions, including the eyes, mouth, and overall face, resulting in a corresponding local feature dataset. Finally, a two-branch convolutional neural network is designed to integrate both global and local facial feature datasets, enhancing the model’s ability to recognize facial characteristics accurately. The global feature branch fully characterizes the global features of the face, while the local feature branch focuses on the local features. An adaptive fusion module integrates the global and local features, enhancing the model’s ability to differentiate subtle emotional changes. To evaluate the accuracy and robustness of the model, we train and test it on the FER-2013 and JAFFE emotion datasets, achieving average accuracies of 80.59% and 97.61%, respectively. Compared to existing state-of-the-art models, our refined face feature extraction and fusion model demonstrates superior performance in emotion recognition. Additionally, the comparative analysis shows that emotional features across different faces show similarities. Building on psychological research, we categorize the dataset into three emotion classes: positive, neutral, and negative. The accuracy of emotion recognition is significantly improved under the new classification criteria. Additionally, the self-built dataset is used to validate further that this classification approach has important implications for practical applications.

Journal Article

Share this book

Add to My Shelf

A novel Swin transformer based framework for speech recognition for dysarthria

by Hassan, Haseeb , Ganiyu, Ismaila , El-Sherbeeny, Ahmed M. in 631/378/1689 , 692/699 , AI in healthcare

2025

Dysarthria frequently occurs in individuals with disorders such as stroke, Parkinson’s disease, cerebral palsy, and other neurological disorders. Well-timed detection and management of dysarthria in these patients is imperative for efficiently handling the development of their condition. Several previous studies have concentrated on detecting dysarthria speech using machine learning-based methods. However, the false positive rate is high due to the varying nature of speech and environmental factors such as background noise. Therefore, in this work, we employ a model based on the Swin transformer (ST), namely DSR-Swinoid. Firstly, the speech is converted into mel-spectrograms to reflect the maximum patterns of voice signals. Despite the ST’s initial aim to effectively extract the local and global visual features, it still prioritizes global features. Meanwhile, in mel-spectrograms, the specific gaps due to slurred speech are considered. Therefore, our objective is to improve the ST’s capacity for learning local features by introducing 4 modules: network for local feature capturing (NLF), convolutional patch concatenation, multi-path (MP), and multi-view block (MVB). The NLF module enriches the existing Swin transformer by enhancing its capability to capture local features effectively. MP integrates features from different Swin phases to emphasize local information. In the meantime, the MVB-ST block surpasses classical Swin blocks by integrating diverse receptive fields, focusing on a more comprehensive extraction of local features. Investigational outcomes explain that the DSR-Swinoid attains the best exactness of 98.66%, exceeding the outcomes by existing methods.

Journal Article

Share this book

Add to My Shelf

An enhanced denoising system for mammogram images using deep transformer model with fusion of local and global features

by Alshetewi, Sameer , Athisayamani, Suganya , Ibrahim, Ahmed Zohair in 639/766/259 , 692/700/1421 , Algorithms

2025

Image denoising is a critical problem in low-level computer vision, where the aim is to reconstruct a clean, noise-free image from a noisy input, such as a mammogram image. In recent years, deep learning, particularly convolutional neural networks (CNNs), has shown great success in various image processing tasks, including denoising, image compression, and enhancement. While CNN-based approaches dominate, Transformer models have recently gained popularity for computer vision tasks. However, there have been fewer applications of Transformer-based models to low-level vision problems like image denoising. In this study, a novel denoising network architecture called DeepTFormer is proposed, which leverages Transformer models for the task. The DeepTFormer architecture consists of three main components: a preprocessing module, a local-global feature extraction module, and a reconstruction module. The local-global feature extraction module is the core of DeepTFormer, comprising several groups of ITransformer layers. Each group includes a series of Transformer layers, convolutional layers, and residual connections. These groups are tightly coupled with residual connections, which allow the model to capture both local and global information from the noisy images effectively. The design of these groups ensures that the model can utilize both local features for fine details and global features for larger context, leading to more accurate denoising. To validate the performance of the DeepTFormer model, extensive experiments were conducted using both synthetic and real noise data. Objective and subjective evaluations demonstrated that DeepTFormer outperforms leading denoising methods. The model achieved impressive results, surpassing state-of-the-art techniques in terms of key metrics like PSNR, FSIM, EPI, and SSIM, with values of 0.41, 0.93, 0.96, and 0.94, respectively. These results demonstrate that DeepTFormer is a highly effective solution for image denoising, combining the power of Transformer architecture with convolutional layers to enhance both local and global feature extraction.

Journal Article

Share this book

Add to My Shelf

Color–Texture Pattern Classification Using Global–Local Feature Extraction, an SVM Classifier, with Bagging Ensemble Post-Processing

by Navarro, Carlos F. , Perez, Claudio A. in bagging post-processing , BQMP and Haralick global–local feature integration , Classification

2019

Many applications in image analysis require the accurate classification of complex patterns including both color and texture, e.g., in content image retrieval, biometrics, and the inspection of fabrics, wood, steel, ceramics, and fruits, among others. A new method for pattern classification using both color and texture information is proposed in this paper. The proposed method includes the following steps: division of each image into global and local samples, texture and color feature extraction from samples using a Haralick statistics and binary quaternion-moment-preserving method, a classification stage using support vector machine, and a final stage of post-processing employing a bagging ensemble. One of the main contributions of this method is the image partition, allowing image representation into global and local features. This partition captures most of the information present in the image for colored texture classification allowing improved results. The proposed method was tested on four databases extensively used in color–texture classification: the Brodatz, VisTex, Outex, and KTH-TIPS2b databases, yielding correct classification rates of 97.63%, 97.13%, 90.78%, and 92.90%, respectively. The use of the post-processing stage improved those results to 99.88%, 100%, 98.97%, and 95.75%, respectively. We compared our results to the best previously published results on the same databases finding significant improvements in all cases.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter