Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
54 result(s) for "deep fusion matching network"
Sort by:
A Deep Fusion Matching Network Semantic Reasoning Model
As the vital technology of natural language understanding, sentence representation reasoning technology mainly focuses on sentence representation methods and reasoning models. Although the performance has been improved, there are still some problems, such as incomplete sentence semantic expression, lack of depth of reasoning model, and lack of interpretability of the reasoning process. Given the reasoning model’s lack of reasoning depth and interpretability, a deep fusion matching network is designed in this paper, which mainly includes a coding layer, matching layer, dependency convolution layer, information aggregation layer, and inference prediction layer. Based on a deep matching network, the matching layer is improved. Furthermore, the heuristic matching algorithm replaces the bidirectional long-short memory neural network to simplify the interactive fusion. As a result, it improves the reasoning depth and reduces the complexity of the model; the dependency convolution layer uses the tree-type convolution network to extract the sentence structure information along with the sentence dependency tree structure, which improves the interpretability of the reasoning process. Finally, the performance of the model is verified on several datasets. The results show that the reasoning effect of the model is better than that of the shallow reasoning model, and the accuracy rate on the SNLI test set reaches 89.0%. At the same time, the semantic correlation analysis results show that the dependency convolution layer is beneficial in improving the interpretability of the reasoning process.
Characterization inference based on joint-optimization of multi-layer semantics and deep fusion matching network
The whole sentence representation reasoning process simultaneously comprises a sentence representation module and a semantic reasoning module. This paper combines the multi-layer semantic representation network with the deep fusion matching network to solve the limitations of only considering a sentence representation module or a reasoning model. It proposes a joint optimization method based on multi-layer semantics called the Semantic Fusion Deep Matching Network (SCF-DMN) to explore the influence of sentence representation and reasoning models on reasoning performance. Experiments on text entailment recognition tasks show that the joint optimization representation reasoning method performs better than the existing methods. The sentence representation optimization module and the improved optimization reasoning model can promote reasoning performance when used individually. However, the optimization of the reasoning model has a more significant impact on the final reasoning results. Furthermore, after comparing each module’s performance, there is a mutual constraint between the sentence representation module and the reasoning model. This condition restricts overall performance, resulting in no linear superposition of reasoning performance. Overall, by comparing the proposed methods with other existed methods that are tested using the same database, the proposed method solves the lack of in-depth interactive information and interpretability in the model design which would be inspirational for future improving and studying of natural language reasoning.
Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey
Multi-viewpoint image stitching aims to stitch images taken from different viewpoints into pictures with a broader field of view. The stitched images are subject to artifacts, geometric distortion, and blur distortion due to the mismatch of feature points, inaccurate homography estimation, and improper fusion of the unstitched images. Deep learning has recently been increasingly applied to multi-viewpoint image stitching to overcome these problems. However, there has thus far been little related research to summarize the different deep learning techniques used for multi-viewpoint image stitching. Therefore, this review aims to explore the application of deep learning to multi-viewpoint image stitching. To better illustrate this topic, we first summarize the acquisition methods for multi-viewpoint images and the main challenges of image stitching. After which, deep learning techniques for multi-view image stitching with a single camera are sorted out. Subsequently, deep learning techniques for multi-view image stitching with camera arrays, including parallel-view multi-view image stitching and cross-view multi-view image stitching, are presented. Next, we summarize image stitching datasets, evaluation metrics, and experimental data of several leading stitching algorithms on public datasets. Finally, we discuss potential issues and future work on image stitching with multi-viewpoint images.
Fusion of Visible and Infrared Aerial Images from Uncalibrated Sensors Using Wavelet Decomposition and Deep Learning
Multi-modal systems extract information about the environment using specialized sensors that are optimized based on the wavelength of the phenomenology and material interactions. To maximize the entropy, complementary systems operating in regions of non-overlapping wavelengths are optimal. VIS-IR (Visible-Infrared) systems have been at the forefront of multi-modal fusion research and are used extensively to represent information in all-day all-weather applications. Prior to image fusion, the image pairs have to be properly registered and mapped to a common resolution palette. However, due to differences in the device physics of image capture, information from VIS-IR sensors cannot be directly correlated, which is a major bottleneck for this area of research. In the absence of camera metadata, image registration is performed manually, which is not practical for large datasets. Most of the work published in this area assumes calibrated sensors and the availability of camera metadata providing registered image pairs, which limits the generalization capability of these systems. In this work, we propose a novel end-to-end pipeline termed DeepFusion for image registration and fusion. Firstly, we design a recursive crop and scale wavelet spectral decomposition (WSD) algorithm for automatically extracting the patch of visible data representing the thermal information. After data extraction, both the images are registered to a common resolution palette and forwarded to the DNN for image fusion. The fusion performance of the proposed pipeline is compared and quantified with state-of-the-art classical and DNN architectures for open-source and custom datasets demonstrating the efficacy of the pipeline. Furthermore, we also propose a novel keypoint-based metric for quantifying the quality of fused output.
SA-Net: Scene-Aware Network for Cross-domain Stereo Matching
Although the recent stereo matching methods based on deep learning achieve unprecedented state-of-the-art performance, the accuracy of these approaches suffers a drastic drop when dealing with environments much different in context from those observed at training time. In this paper, we propose a novel Scene-Aware Network (SA-Net) that integrates scene information to achieve cross-domain stereo matching. Specifically, we design a Scene-Aware Module (SAM) to extract rich scene details, which can make the network with it have better generalization ability between different domain. In order to use rich scene information to perfectly guide shallow features to realize cost aggregation, we introduce a new Multi-element Feature Fusion Strategy (MFFS). Extensive quantitative and qualitative evaluations on different domain illustrate that our SA-Net achieves competitive performance and in particular obtains better ability of domain generalization.
Uncertainty-aware correspondence identification for collaborative perception
Correspondence identification is essential for multi-robot collaborative perception, which aims to identify the same objects in order to ensure consistent references of the objects by a group of robots/agents in their own fields of view. Although recent deep learning methods have shown encouraging performance on correspondence identification, they suffer from two shortcomings, including the inability to address non-covisibility and the inability to quantify and reduce uncertainty to improve correspondence identification. To address both issues, we propose a novel uncertainty-aware deep graph matching method for correspondence identification in collaborative perception. Our new approach formulates correspondence identification as a deep graph matching problem, which identifies correspondences based on deep graph neural network-based features and explicitly quantify uncertainties in the identified correspondences under the Bayesian framework. In addition, we design a novel loss function that explicitly reduces correspondence uncertainty and perceptual non-covisibility during learning. Finally, we design a novel multi-robot sensor fusion method that integrates the multi-robot observations given the identified correspondences to perform collaborative object localization. We evaluate our approach in the robotics applications of collaborative assembly, multi-robot coordination and connected autonomous driving using high-fidelity simulations and physical robots. Experiments have shown that, our approach achieves the state-of-the-art performance of correspondence identification. Furthermore, the identified correspondences of objects can be well integrated into multi-robot collaboration for object localization.
A novel method for image captioning using multimodal feature fusion employing mask RNN and LSTM models
Image captioning is a technique that allows us to use computers to interpret the information in photographs and make written text. The use of deep learning to interpret image information and create descriptive text has become a widely researched issue since its establishment. Nevertheless, these strategies do not identify all samples that depict conceptual ideas. In reality, the vast majority of them seem to be irrelevant to the matching tasks. The degree of similarity is determined only by a few relevant semantic occurrences. This duplicate instance can also be thought of as noise, as it obstructs the matching process of a few meaningful instances and adds to the model's computational effort. In the existing scheme, traditional convolutional neural networks (CNN) are presented. For that reason, captioning is not effective due to its structure. Furthermore, present approaches frequently require the deliberate use of additional target recognition algorithms or costly human labeling when extracting information is required. For image captioning, this research presents a multimodal feature fusion-based deep learning model. The coding layer uses mask recurrent neural networks (Faster RCNN), the long short-term memory has been used to decode, and the descriptive text is constructed. In deep learning, the model parameters are optimized through the method of gradient optimization. In the decoding layer, dense attention mechanisms can assist in minimizing non-salient data interruption and preferentially input the appropriate data for the decryption stage. Input images are used to train a model that, when given the opportunity, will provide captions that are very close to accurately describing the images. Various datasets are used to evaluate the model's precision and the fluency, or mastery, of the language it acquires by analyzing picture descriptions. Results from these tests demonstrate that the model consistently provides correct descriptions of input images. This model has been taught to provide captions or words describing an input picture. To measure the effectiveness of the model, the system is given categorization scores. With a batch size of 512 and 100 training epochs, the suggested system shows a 95% increase in performance. The model's capacity to comprehend images and generate text is validated by the experimental data in the domain of generic images. This paper is implemented using Python frameworks and also evaluated using performance metrics such as PSNR, RMSE, SSIM, accuracy, recall, F1-score, and precision.
An Audiovisual Correlation Matching Method Based on Fine-Grained Emotion and Feature Fusion
Most existing intelligent editing tools for music and video rely on the cross-modal matching technology of the affective consistency or the similarity of feature representations. However, these methods are not fully applicable to complex audiovisual matching scenarios, resulting in low matching accuracy and suboptimal audience perceptual effects due to ambiguous matching rules and associated factors. To address these limitations, this paper focuses on both the similarity and integration of affective distribution for the artistic audiovisual works of movie and television video and music. Based on the rich emotional perception elements, we propose a hybrid matching model based on feature canonical correlation analysis (CCA) and fine-grained affective similarity. The model refines KCCA fusion features by analyzing both matched and unmatched music–video pairs. Subsequently, the model employs XGBoost to predict relevance and to compute similarity by considering fine-grained affective semantic distance as well as affective factor distance. Ultimately, the matching prediction values are obtained through weight allocation. Experimental results on a self-built dataset demonstrate that the proposed affective matching model balances feature parameters and affective semantic cognitions, yielding relatively high prediction accuracy and better subjective experience of audiovisual association. This paper is crucial for exploring the affective association mechanisms of audiovisual objects from a sensory perspective and improving related intelligent tools, thereby offering a novel technical approach to retrieval and matching in music–video editing.
AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching
The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To address these problems, we propose an Adaptive Geometry-aware Stereo-KANformer Network (AGSK-Net) for unsupervised stereo matching. Firstly, to resolve the conflict between the isotropic nature of traditional ViT and the epipolar geometry priors in stereo matching, we propose Adaptive Geometry-aware Multi-head Self-Attention (AG-MSA), which embeds epipolar priors via an adaptive hybrid structure of geometric modulation and penalty, enabling geometry-aware global context modeling. Secondly, we design Spatial Group-Rational KAN (SGR-KAN), which integrates the nonlinear capability of rational functions with the spatial awareness of deep convolutions, replacing the MLP with flexible, learnable rational functions to enhance the nonlinear expression ability of complex regions. Finally, we propose a Dynamic Candidate Gated Fusion (DCGF) module that employs dynamic dual-candidate states and spatially aware pre-enhancement to adaptively fuse global and local features across scales. Experiments demonstrate that AGSK-Net achieves state-of-the-art accuracy and generalizability on Scene Flow, KITTI 2012/2015, and Middlebury 2021.
Self-Supervised Keypoint Detection and Cross-Fusion Matching Networks for Multimodal Remote Sensing Image Registration
Remote sensing image matching is the basis upon which to obtain integrated observations and complementary information representation of the same scene from multiple source sensors, which is a prerequisite for remote sensing tasks such as remote sensing image fusion and change detection. However, the intricate geometric and radiometric differences between the multimodal images render the registration quite challenging. Although multimodal remote sensing image matching methods have been developed in recent decades, most classical and deep learning based techniques cannot effectively extract high repeatable keypoints and discriminative descriptors for multimodal images. Therefore, we propose a two-step “detection + matching” framework in this paper, where each step consists of a deep neural network. A self-supervised detection network is first designed to generate similar keypoint feature maps between multimodal images, which is used to detect highly repeatable keypoints. We then propose a cross-fusion matching network, which aims to exploit global optimization and fusion information for cross-modal feature descriptors and matching. The experiments show that the proposed method has superior feature detection and matching performance compared with current state-of-the-art methods. Specifically, the keypoint repetition rate of the detection network and the NN mAP of the matching network are 0.435 and 0.712 on test datasets, respectively. The proposed whole pipeline framework is evaluated, which achieves an average M.S. and RMSE of 0.298 and 3.41, respectively. This provides a novel solution for the joint use of multimodal remote sensing images for observation and localization.