Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
1,608 result(s) for "multi-view"
Sort by:
Incomplete multi-view clustering with multiple imputation and ensemble clustering
Multi-view clustering is an important and challenging task in machine learning and data mining. In the past decade, this topic attracted much attention and there have been many progress achieved in this field. However, in reality, due to different factors such as machine error, sensor failure, multi-view data are mostly incomplete, thus how to deal with this problem becomes a challenge. Some existing works mainly deal with view missing case, which means in certain view of datasets, the whole features of some samples would be lost. In fact, missing value can occur in any position, that is, any value missing case. In that case, there would be some values missed in any view with sheerly random way. We proposed a two-stage algorithm involved multiple imputation and ensemble clustering to deal with multi-view clustering in any value missing case. Multiple imputation is adopted to deal with missing values problem and weighted ensemble clustering is applied to implement multi-view clustering. The experimental comparison on several data sets verified the effectiveness of the proposed method.
On Unifying Multi-view Self-Representations for Clustering by Tensor Multi-rank Minimization
In this paper, we address the multi-view subspace clustering problem. Our method utilizes the circulant algebra for tensor, which is constructed by stacking the subspace representation matrices of different views and then rotating, to capture the low rank tensor subspace so that the refinement of the view-specific subspaces can be achieved, as well as the high order correlations underlying multi-view data can be explored. By introducing a recently proposed tensor factorization, namely tensor-Singular Value Decomposition (t-SVD) (Kilmer et al. in SIAM J Matrix Anal Appl 34(1):148–172, 2013), we can impose a new type of low-rank tensor constraint on the rotated tensor to ensure the consensus among multiple views. Different from traditional unfolding based tensor norm, this low-rank tensor constraint has optimality properties similar to that of matrix rank derived from SVD, so the complementary information can be explored and propagated among all the views more thoroughly and effectively. The established model, called t-SVD based Multi-view Subspace Clustering (t-SVD-MSC), falls into the applicable scope of augmented Lagrangian method, and its minimization problem can be efficiently solved with theoretical convergence guarantee and relatively low computational complexity. Extensive experimental testing on eight challenging image datasets shows that the proposed method has achieved highly competent objective performance compared to several state-of-the-art multi-view clustering methods.
Deep models for multi-view 3D object recognition: a review
This review paper focuses on the progress of deep learning-based methods for multi-view 3D object recognition. It covers the state-of-the-art techniques in this field, specifically those that utilize 3D multi-view data as input representation. The paper provides a comprehensive analysis of the pipeline for deep learning-based multi-view 3D object recognition, including the various techniques employed at each stage. It also presents the latest developments in CNN-based and transformer-based models for multi-view 3D object recognition. The review discusses existing models in detail, including the datasets, camera configurations, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance. Additionally, it examines various computer vision applications that use multi-view classification. Finally, it highlights future directions, factors impacting recognition performance, and trends for the development of multi-view 3D object recognition method.
A novel semi-supervised consensus fuzzy clustering method for multi-view relational data
Multi-view data is widely employed in various domains, highlighting the need for advanced clustering methodologies to efficiently extract knowledge from these datasets. Consequently, multi-view clustering has emerged as a prominent research topic in recent years. In this paper, we propose a novel approach: the semi-supervised consensus fuzzy clustering method for multi-view relational data (SSCFMC). This method combines the advantages of fuzzy clustering and consensus clustering to address the challenges posed by multi-view data. By leveraging available labeled information and the relational structure among views, our method aims to enhance clustering performance. Extensive experiments on benchmark datasets demonstrate that our method surpasses existing single-view and multi-view relational clustering algorithms in terms of accuracy and stability. Specifically, the SSCFMC algorithm exhibits superior clustering performance across various datasets, achieving an adjusted rand index (ARI) of 0.68 on the multiple features dataset and an F-measure of 0.91 on the internet dataset, highlighting its robustness and efficiency. Overall, this study advances multi-view clustering techniques for relational data and provides valuable insights for researchers in this field.
RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems
Accurate spatiotemporal alignment of multi-view video streams is essential for a wide range of dynamic-scene applications such as multi-view 3D reconstruction, pose estimation, and scene understanding. However, synchronizing multiple cameras remains a significant challenge, especially in heterogeneous setups combining professional- and consumer-grade devices, visible and infrared sensors, or systems with and without audio, where common hardware synchronization capabilities are often unavailable. This limitation is particularly evident in real-world environments, where controlled capture conditions are not feasible. In this work, we present a low-cost, general-purpose synchronization method that achieves millisecond-level temporal alignment across diverse camera systems while supporting both visible (RGB) and infrared (IR) modalities. The proposed solution employs a custom-built LED Clock that encodes time through red and infrared LEDs, allowing visual decoding of the exposure window (start and end times) from recorded frames for millisecond-level synchronization. We benchmark our method against hardware synchronization and achieve a residual error of 1.34 ms RMSE across multiple recordings. In further experiments, our method outperforms light-, audio-, and timecode-based synchronization approaches and directly improves downstream computer vision tasks, including multi-view pose estimation and 3D reconstruction. Finally, we validate the system in large-scale surgical recordings involving over 25 heterogeneous cameras spanning both IR and RGB modalities. This solution simplifies and streamlines the synchronization pipeline and expands access to advanced vision-based sensing in unconstrained environments, including industrial and clinical applications.
EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time
Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the output is composed of a sequence of asynchronous events rather than actual intensity images, traditional vision algorithms cannot be applied, so that a paradigm shift is needed. We introduce the problem of event-based multi-view stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (1) its ability to respond to scene edges—which naturally provide semi-dense geometric information without any pre-processing operation—and (2) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps, without requiring any explicit data association or intensity estimation. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a CPU.
The Impact of the Calibration Method on the Accuracy of Point Clouds Derived Using Unmanned Aerial Vehicle Multi-View Stereopsis
In unmanned aerial vehicle (UAV) photogrammetric surveys, the cameracan be pre-calibrated or can be calibrated \"on-the-job\" using structure-from-motion anda self-calibrating bundle adjustment. This study investigates the impact on mapping accuracyof UAV photogrammetric survey blocks, the bundle adjustment and the 3D reconstructionprocess under a range of typical operating scenarios for centimetre-scale natural landformmapping (in this case, a coastal cliff). We demonstrate the sensitivity of the process tocalibration procedures and the need for careful accuracy assessment. For this investigation, vertical (nadir or near-nadir) and oblique photography were collected with 80%–90%overlap and with accurately-surveyed (σ ≤ 2 mm) and densely-distributed ground control.This allowed various scenarios to be tested and the impact on mapping accuracy to beassessed. This paper presents the results of that investigation and provides guidelines thatwill assist with operational decisions regarding camera calibration and ground control forUAV photogrammetry. The results indicate that the use of either a robust pre-calibration ora robust self-calibration results in accurate model creation from vertical-only photography,and additional oblique photography may improve the results. The results indicate thatif a dense array of high accuracy ground control points are deployed and the UAVphotography includes both vertical and oblique images, then either a pre-calibration or anon-the-job self-calibration will yield reliable models (pre-calibration RMSEXY = 7.1 mmand on-the-job self-calibration RMSEXY = 3.2 mm). When oblique photography was Remote Sens. 2015, 7 11934 excluded from the on-the-job self-calibration solution, the accuracy of the model deteriorated(by 3.3 mm horizontally and 4.7 mm vertically). When the accuracy of the ground controlwas then degraded to replicate typical operational practice (σ = 22 mm), the accuracyof the model further deteriorated (e.g., on-the-job self-calibration RMSEXY went from3.2–7.0 mm). Additionally, when the density of the ground control was reduced, the modelaccuracy also further deteriorated (e.g., on-the-job self-calibration RMSEXY went from7.0–7.3 mm). However, our results do indicate that loss of accuracy due to sparse groundcontrol can be mitigated by including oblique imagery.
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer
Sign language serves as a critical communication medium for the deaf community, yet existing single-view recognition systems are limited in interpreting complex three-dimensional manual movements from monocular video sequences. Although multi-view analysis holds potential for improved spatial understanding, current methods lack effective mechanisms for cross-view feature correlation and adaptive multi-stream fusion. To address these challenges, we propose the Cross-view and Multi-level Transformer (CMTformer), a novel framework for isolated sign language recognition that hierarchically models spatiotemporal dependencies across viewpoints. The architecture integrates transformer-based modules to simultaneously capture dense cross-view correlations and distill high-level semantic relationships through multi-scale feature abstraction. Complementing this methodological advancement, we establish the Multi-View Chinese Sign Language (MVCSL) dataset under real-world conditions, addressing the critical shortage of multi-view benchmarking resources. Experimental evaluations demonstrate that CMTformer significantly outperforms conventional approaches in recognition robustness, particularly in processing intricate gesture dynamics through coordinated multi-view analysis. This study advances sign language recognition via interpretable cross-view modeling while providing an essential dataset for developing viewpoint-agnostic gesture understanding systems.
Multi-view reinforcement learning for sequential decision-making with insufficient state information
Most reinforcement learning methods describe sequential decision-making as a Markov decision process where the effect of action is only decided by the current state. But this is reasonable only if the state is correctly defined and the state information is sufficiently observed. Thus the learning efficiency of reinforcement learning methods based on Markov decision process is limited when the state information is insufficient. Partially observable Markov decision process and history-based decision process are respectively proposed to describe sequential decision-making with insufficient state information. However, these two processes are easy to ignore the important information from the current observed state. Therefore, the learning efficiency of reinforcement learning methods based on these two processes is also limited when the state information is insufficient. In this paper, we propose a multi-view reinforcement learning method to solve this problem. The motivation is that the interaction information between the agent and its environment should be considered from the views of history, present, and future to overcome the insufficiency of state information. Based on these views, we construct a multi-view decision process to describe sequential decision-making with insufficient state information. A multi-view reinforcement learning method is proposed by combining the multi-view decision process and the actor-critic framework. In the proposed method, multi-view clustering is performed to ensure that each type of sample can be sufficiently exploited. Experiments illustrate that the proposed method is more effective than the compared state-of-the-arts. The source code can be downloaded from https://github.com/jamieliuestc/MVRL.
Attention‐based model for dynamic IR drop prediction with multi‐view features
Dynamic IR drop prediction based on machine learning has been studied in recent years. However, most proposed models used all input features extracted from circuits or manually selected parts of raw features as inputs, which failed to differentiate the order of priority among input features in a flexible manner. In this paper, QuantumForest to vector‐based dynamic IR drop prediction is introduced. With the sparse attention mechanism brought by QuantumForest, important attributes of circuits are weighed more heavily than others. A new multi‐view feature creation method is also proposed and a novel regional distance feature is built up subsequently. The performance is evaluated on two chip designs with real simulation vectors. The experiment results indicate that the prediction result of the method outperforms other prominent methods for dealing with machine learning based IR drop analysis, reaching an average MAE of only 1.457 mV $\\text{mV}$on two designs. In this letter, the sparse attention mechanism to vector‐based dynamic IR drop prediction with QuantumForest to weigh important attributes of circuits more heavily than others is introduced. A new multi‐view feature creation method is proposed and a novel regional distance feature is built up subsequently. The evaluation result proves that QuantumForest reaches the best accuracy on two example chip designs using the multi‐view features.