Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
276 result(s) for "multi-view recognition"
Sort by:
Exploiting Multi-View SAR Images for Robust Target Recognition
The exploitation of multi-view synthetic aperture radar (SAR) images can effectively improve the performance of target recognition. However, due to the various extended operating conditions (EOCs) in practical applications, some of the collected views may not be discriminative enough for target recognition. Therefore, each of the input views should be examined before being passed through to multi-view recognition. This paper proposes a novel structure for multi-view SAR target recognition. The multi-view images are first classified by sparse representation-based classification (SRC). Based on the output residuals, a reliability level is calculated to evaluate the effectiveness of a certain view for multi-view recognition. Meanwhile, the support samples for each view selected by SRC collaborate to construct an enhanced local dictionary. Then, the selected views are classified by joint sparse representation (JSR) based on the enhanced local dictionary for target recognition. The proposed method can eliminate invalid views for target recognition while enhancing the representation capability of JSR. Therefore, the individual discriminability of each valid view as well as the inner correlation among all of the selected views can be exploited for robust target recognition. Experiments are conducted on the moving and stationary target acquisition recognition (MSTAR) dataset to demonstrate the validity of the proposed method.
Deep models for multi-view 3D object recognition: a review
This review paper focuses on the progress of deep learning-based methods for multi-view 3D object recognition. It covers the state-of-the-art techniques in this field, specifically those that utilize 3D multi-view data as input representation. The paper provides a comprehensive analysis of the pipeline for deep learning-based multi-view 3D object recognition, including the various techniques employed at each stage. It also presents the latest developments in CNN-based and transformer-based models for multi-view 3D object recognition. The review discusses existing models in detail, including the datasets, camera configurations, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance. Additionally, it examines various computer vision applications that use multi-view classification. Finally, it highlights future directions, factors impacting recognition performance, and trends for the development of multi-view 3D object recognition method.
A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context
Due to progress and demographic change, society is facing a crucial challenge related to increased life expectancy and a higher number of people in situations of dependency. As a consequence, there exists a significant demand for support systems for personal autonomy. This article outlines the vision@home project, whose goal is to extend independent living at home for elderly and impaired people, providing care and safety services by means of vision-based monitoring. Different kinds of ambient-assisted living services are supported, from the detection of home accidents, to telecare services. In this contribution, the specification of the system is presented, and novel contributions are made regarding human behaviour analysis and privacy protection. By means of a multi-view setup of cameras, people’s behaviour is recognised based on human action recognition. For this purpose, a weighted feature fusion scheme is proposed to learn from multiple views. In order to protect the right to privacy of the inhabitants when a remote connection occurs, a privacy-by-context method is proposed. The experimental results of the behaviour recognition method show an outstanding performance, as well as support for multi-view scenarios and real-time execution, which are required in order to provide the proposed services.
Multi-View Object Recognition and Pose Sequence Estimation Using HMMs
This work proposes an integration of a vision system for a service robot when its gripper holds an object. Based on the particular conditions of the problem, the solution is modular and allows one to use various options to extract features and classify data. Since the robot can move the object and has information about its position, the proposed solution takes advantage of this by applying preprocessing techniques to improve the performance of classifiers that can be considered weak. In addition to being able to classify the object, it is possible to infer the sequence of movements that it carries out using hidden Markov models (HMMs). The system was tested using a public dataset, the COIL-100, as well as with a dataset of real objects using the human support robot (HSR). The results show that the proposed vision system is able to work with a low number of shots in each class. Two HMM architectures are tested. In order to enhance classification by adding information from multiple perspectives, various criteria were analyzed. A simple model is built to integrate information and infer object movements. The system also has an next best view algorithm where different parameters are tested in order to improve both accuracy in the classification of the object and its pose, especially in objects that may be similar in several of their views. The system was tested using COIL-100 dataset and with real objects in common use and a HSR robot to take the real dataset. In general, using relatively few shots of each class and a plain computer, consistent results were obtained, requiring only 8.192×10 -3 MFLOPs for sequence processing using concatenated HMMs compared to 404.34 MFLOPs for CNN+LSTM.
Research on an Optimal Path Planning Method Based on A Algorithm for Multi-View Recognition
In order to obtain the optimal perspectives of the recognition target, this paper combines the motion path of the manipulator arm and camera. A path planning method to find the optimal perspectives based on an A* algorithm is proposed. The quality of perspectives is represented by means of multi-view recognition. A binary multi-view 2D kernel principal component analysis network (BM2DKPCANet) is built to extract features. The multi-view angles classifier based on BM2DKPCANet + Softmax is established, which outputs category posterior probability to represent the perspective recognition performance function. The path planning problem is transformed into a multi-objective optimization problem by taking the optimal view recognition and the shortest path distance as the objective functions. In order to reduce the calculation, the multi-objective optimization problem is transformed into a single optimization problem by fusing the objective functions based on the established perspective observation directed graph model. An A* algorithm is used to solve the single source shortest path problem of the fused directed graph. The path planning experiments with different numbers of view angles and different starting points demonstrate that the method can guide the camera to reach the viewpoint with higher recognition accuracy and complete the optimal observation path planning.
Gait recognition based on Orthogonal view feature extraction
Gait is used for personal identification but is sensitive to covariates such as views and walking conditions. To reduce the influence of views on the accuracy of gait recognition, this paper proposes an Orthogonal-view Feature Decomposition Network based on GaitSet (OFD-GaitSet). The algorithm regards gait recognition as two orthogonal view components of gait recognition. Firstly, the algorithm improves the setting of the gait gallery so that each sample in the gallery contains gait information with two views: 0° and 90°; Secondly, the algorithm designs two Feature Extraction Networks, which extract the gait sub-features of the gait silhouettes sequence from two views. At the same time, the View Identification Network and Distance Block are used to weight the Euclidean Distance between the gait sub-features and the gallery’s, and the recognition results are obtained through comparison. This algorithm uses Cross Entropy Loss and improved Triplet Loss for training. Experiments on the CASIA-B dataset show that the average Raank-1 accuracy reaches 99.8% under normal walking (NM) conditions, 99.1% under walking with bag (BG) conditions, and 88.2% under wearing coat or jacket (CL) conditions. Compared with GaitSet, it improves by 4.8%, 11.9%, and 17.8%, respectively; Experiments on the OU-MVLP dataset have achieved a Rank-1 accuracy of 89.8%, which is 2.7% higher than the GaitSet.
MORE: simultaneous multi-view 3D object recognition and pose estimation
Simultaneous object recognition and pose estimation are two key functionalities for robots to safely interact with humans as well as environments. Although both object recognition and pose estimation use visual input, most state of the art tackles them as two separate problems since the former needs a view-invariant representation, while object pose estimation necessitates a view-dependent description. Nowadays, multi-view convolutional neural network (MVCNN) approaches show state-of-the-art classification performance. Although MVCNN object recognition has been widely explored, there has been very little research on multi-view object pose estimation methods, and even less on addressing these two problems simultaneously. The pose of virtual cameras in MVCNN methods is often pre-defined in advance, leading to bound the application of such approaches. In this paper, we propose an approach capable of handling object recognition and pose estimation simultaneously. In particular, we develop a deep object-agnostic entropy estimation model, capable of predicting the best viewpoints of a given 3D object. The obtained views of the object are then fed to the network to simultaneously predict the pose and category label of the target object. Experimental results showed that the views obtained from such positions are descriptive enough to achieve a good accuracy score. Furthermore, we designed a real-life serve drink scenario to demonstrate how well the proposed approach worked in real robot tasks. Code is available online at: https://github.com/SubhadityaMukherjee/more_mvcnn .
Enhancing Place Emotion Analysis with Multi-View Emotion Recognition from Geo-Tagged Photos: A Global Tourist Attraction Perspective
User-generated geo-tagged photos (UGPs) have emerged as a valuable tool for analyzing large-scale tourist place emotions with unprecedented detail. This process involves extracting and analyzing human emotions associated with specific locations. However, previous studies have been limited to analyzing individual faces in the UGPs. This approach falls short of representing the contextual scene characteristics, such as environmental elements and overall scene context, which may contain implicit emotional knowledge. To address this issue, we propose an innovative computational framework for global tourist place emotion analysis leveraging UGPs. Specifically, we first introduce a Multi-view Graph Fusion Network (M-GFN) to effectively recognize multi-view emotions from UGPs, considering crowd emotions and scene implicit sentiment. After that, we designed an attraction-specific emotion index (AEI) to quantitatively measure place emotions based on the identified multi-view emotions at various tourist attractions with place types. Complementing the AEI, we employ the emotion intensity index (EII) and Pearson correlation coefficient (PCC) to deepen the exploration of the association between attraction types and place emotions. The synergy of AEI, EII, and PCC allows comprehensive attraction-specific place emotion extraction, enhancing the overall quality of tourist place emotion analysis. Extensive experiments demonstrate that our framework enhances existing place emotion analysis methods, and the M-GFN outperforms state-of-the-art emotion recognition methods. Our framework can be adapted for various geo-emotion analysis tasks, like recognizing and regulating workplace emotions, underscoring the intrinsic link between emotions and geographic contexts.
PolarFormer: A Registration-Free Fusion Transformer with Polar Coordinate Position Encoding for Multi-View SAR Target Recognition
Multi-view Synthetic Aperture Radar (SAR) provides rich information for target recognition. However, fusing features from unaligned multi-view images presents challenges for existing methods. Conventional early fusion methods often rely on image registration, a process that is computationally intensive and can introduce feature distortions. More recent registration-free approaches based on the Transformer architecture are constrained by standard position encodings, which were not designed to represent the rotational relationships among multi-view SAR data and thus can cause spatial ambiguity. To address this specific limitation of position encodings, we propose a registration-free fusion framework based on a spatially aware Transformer. The framework includes two key components: (1) a multi-view polar coordinate position encoding that models the geometric relationships of patches both within and across views in a unified coordinate system; and (2) a spatially aware self-attention mechanism that injects this geometric information as a learnable inductive bias. Experiments were conducted on our self-developed FAST-Vehicle dataset, which provides full 360° azimuthal coverage. The results show that our method outperforms both registration-based strategies and Transformer baselines that use conventional position encodings. This work indicates that for multi-view SAR fusion, explicitly modeling the underlying geometric relationships with a suitable position encoding is an effective alternative to physical image registration or the use of generic, single-image position encodings.
Multi-View Gait Recognition Based on a Siamese Vision Transformer
Although the vision transformer has been used in gait recognition, its application in multi-view gait recognition remains limited. Different views significantly affect the accuracy with which the characteristics of gait contour are extracted and identified. To address this issue, this paper proposes a Siamese mobile vision transformer (SMViT). This model not only focuses on the local characteristics of the human gait space, but also considers the characteristics of long-distance attention associations, which can extract multi-dimensional step status characteristics. In addition, it describes how different perspectives affect the gait characteristics and generates reliable features of perspective–relationship factors. The average recognition rate of SMViT for the CASIA B dataset reached 96.4%. The experimental results show that SMViT can attain a state-of-the-art performance when compared to advanced step-recognition models, such as GaitGAN, Multi_view GAN and Posegait.