Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
116
result(s) for
"Phan Xuan Tan"
Sort by:
IRDC-Net: Lightweight Semantic Segmentation Network Based on Monocular Camera for Mobile Robot Navigation
by
Thai-Viet Dang
,
Dinh-Manh-Cuong Tran
,
Phan Xuan Tan
in
Algorithms
,
Cameras
,
Chemical technology
2023
Computer vision plays a significant role in mobile robot navigation due to the wealth of information extracted from digital images. Mobile robots localize and move to the intended destination based on the captured images. Due to the complexity of the environment, obstacle avoidance still requires a complex sensor system with a high computational efficiency requirement. This study offers a real-time solution to the problem of extracting corridor scenes from a single image using a lightweight semantic segmentation model integrating with the quantization technique to reduce the numerous training parameters and computational costs. The proposed model consists of an FCN as the decoder and MobilenetV2 as the decoder (with multi-scale fusion). This combination allows us to significantly minimize computation time while achieving high precision. Moreover, in this study, we also propose to use the Balance Cross-Entropy loss function to handle diverse datasets, especially those with class imbalances and to integrate a number of techniques, for example, the Adam optimizer and Gaussian filters, to enhance segmentation performance. The results demonstrate that our model can outperform baselines across different datasets. Moreover, when being applied to practical experiments with a real mobile robot, the proposed model’s performance is still consistent, supporting the optimal path planning, allowing the mobile robot to efficiently and effectively avoid the obstacles.
Journal Article
Efficient industrial point cloud anomaly detection via spatial context aggregation and selective anomalous feature generation
2026
Automated detection of surface defects on three-dimensional (3D) parts is vital for ensuring product quality and safety in manufacturing. However, three key challenges hinder reliable detection: geometric context ambiguity across complex part shapes, domain mismatch between generic pretrained features and industrial scans (with their unique noise and reflectivity), and the scarcity of diverse defect examples for training. To overcome these issues, we propose a novel single-forward-pass framework for point cloud anomaly detection, comprising three new modules: (1) Spatial Context Aggregation, which grounds each local patch in a set of learned global prototypes via an optimal-transport alignment to resolve context ambiguity; (2) Feature Adaptor, a lightweight two-layer multilayer perceptron (MLP) that fine-tunes self-supervised Point-MAE embeddings to the specific characteristics of industrial scans; and (3) Selective Anomalous Feature Generator, which synthesizes realistic hard negatives by corrupting random subsets of feature tokens, thus mitigating the need for extensive defect labels. An attention-based discriminator trained with patch-wise supervision learns to distinguish these hard negatives from genuine defect-free patterns. At inference, our pipeline delivers dense per-point anomaly scores in a single pass at up to 13.5 frames per second (FPS). On the Real3D-AD benchmark, we observe point-level improvements of 2.8% in area under the receiver operating characteristic curve (AUROC) and 5.7% in area under the precision-recall curve (AUPR), with object-level gains of 3.0% (AUROC) and 3.5% (AUPR). Evaluated on our newly released Industrial3D-AD dataset, which captures realistic sensor noise and reflective materials, we see similar enhancements (2.9%/5.3% point-level, 2.8%/3.3% object-level).
Journal Article
Domain Adaptation for Imitation Learning Using Generative Adversarial Network
2021
Imitation learning is an effective approach for an autonomous agent to learn control policies when an explicit reward function is unavailable, using demonstrations provided from an expert. However, standard imitation learning methods assume that the agents and the demonstrations provided by the expert are in the same domain configuration. Such an assumption has made the learned policies difficult to apply in another distinct domain. The problem is formalized as domain adaptive imitation learning, which is the process of learning how to perform a task optimally in a learner domain, given demonstrations of the task in a distinct expert domain. We address the problem by proposing a model based on Generative Adversarial Network. The model aims to learn both domain-shared and domain-specific features and utilizes it to find an optimal policy across domains. The experimental results show the effectiveness of our model in a number of tasks ranging from low to complex high-dimensional.
Journal Article
Novel Projection Schemes for Graph-Based Light Field Coding
2022
In light field compression, graph-based coding is powerful to exploit signal redundancy along irregular shapes and obtains good energy compaction. However, apart from high time complexity to process high dimensional graphs, their graph construction method is highly sensitive to the accuracy of disparity information between viewpoints. In real-world light field or synthetic light field generated by computer software, the use of disparity information for super-rays projection might suffer from inaccuracy due to vignetting effect and large disparity between views in the two types of light fields, respectively. This paper introduces two novel projection schemes resulting in less error in disparity information, in which one projection scheme can also significantly reduce computation time for both encoder and decoder. Experimental results show projection quality of super-pixels across views can be considerably enhanced using the proposals, along with rate-distortion performance when compared against original projection scheme and HEVC-based or JPEG Pleno-based coding approaches.
Journal Article
Repetition-Based Approach for Task Adaptation in Imitation Learning
2022
Transfer learning is an effective approach for adapting an autonomous agent to a new target task by transferring knowledge learned from the previously learned source task. The major problem with traditional transfer learning is that it only focuses on optimizing learning performance on the target task. Thus, the performance on the target task may be improved in exchange for the deterioration of the source task’s performance, resulting in an agent that is not able to revisit the earlier task. Therefore, transfer learning methods are still far from being comparable with the learning capability of humans, as humans can perform well on both source and new target tasks. In order to address this limitation, a task adaptation method for imitation learning is proposed in this paper. Being inspired by the idea of repetition learning in neuroscience, the proposed adaptation method enables the agent to repeatedly review the learned knowledge of the source task, while learning the new knowledge of the target task. This ensures that the learning performance on the target task is high, while the deterioration of the learning performance on the source task is small. A comprehensive evaluation over several simulated tasks with varying difficulty levels shows that the proposed method can provide high and consistent performance on both source and target tasks, outperforming existing transfer learning methods.
Journal Article
ELDE-Net: Efficient Light-Weight Depth Estimation Network for Deep Reinforcement Learning-Based Mobile Robot Path Planning
by
Tan, Phan Xuan
,
Dang, Thai-Viet
,
Tran, Dinh-Manh-Cuong
in
Artificial neural networks
,
Cellular telephones
,
Color imagery
2025
Precise and robust three-dimensional object detection (3DOD) presents a promising opportunity in the field of mobile robot (MR) navigation. Monocular 3DOD techniques typically involve extending existing two-dimensional object detection (2DOD) frameworks to predict the three-dimensional bounding box (3DBB) of objects captured in 2D RGB images. However, these methods often require multiple images, making them less feasible for various real-time scenarios. To address these challenges, the emergence of agile convolutional neural networks (CNNs) capable of inferring depth from a single image opens a new avenue for investigation. The paper proposes a novel ELDE-Net network designed to produce cost-effective 3D Bounding Box Estimation (3D-BBE) from a single image. This novel framework comprises the PP-LCNet as the encoder and a fast convolutional decoder. Additionally, this integration includes a Squeeze-Exploit (SE) module utilizing the Math Kernel Library for Deep Neural Networks (MKLDNN) optimizer to enhance convolutional efficiency and streamline model size during effective training. Meanwhile, the proposed multi-scale sub-pixel decoder generates high-quality depth maps while maintaining a compact structure. Furthermore, the generated depth maps provide a clear perspective with distance details of objects in the environment. These depth insights are combined with 2DOD for precise evaluation of 3D Bounding Boxes (3DBB), facilitating scene understanding and optimal route planning for mobile robots. Based on the estimated object center of the 3DBB, the Deep Reinforcement Learning (DRL)-based obstacle avoidance strategy for MRs is developed. Experimental results demonstrate that our model achieves state-of-the-art performance across three datasets: NYU-V2, KITTI, and Cityscapes. Overall, this framework shows significant potential for adaptation in intelligent mechatronic systems, particularly in developing knowledge-driven systems for mobile robot navigation.
Journal Article
Unsupervised industrial anomaly detection using paired well-lit and low-light images
2025
Abstract
Unsupervised industrial anomaly detection trains models solely on anomaly-free images to detect unseen defects. While embedding-based methods have recently achieved state-of-the-art results, their use of memory banks substantially increases memory usage and inference times, limiting their practicality in industrial settings. In this work, we propose a lightweight and efficient framework for anomaly detection and localization using paired well-lit and low-light images. Our network learns to reconstruct well-lit features from low-light features on nominal (anomaly-free) samples, detecting anomalies by identifying inconsistencies between the reconstructed and extracted features. Experimental results demonstrate that our method outperforms existing state-of-the-art approaches across multiple industrial datasets. Specifically, our model achieves an Image-level Area Under the Receiver Operating Characteristic (I-AUROC) of 0.854 and rea Under the Per-Region Overlap (AUPRO) of 0.823 on low-light industrial anomaly detection (LL-IAD), significantly surpassing existing methods. Furthermore, it attains I-AUROC scores of 0.864 and 0.858 on the Insulator and Clutch datasets, respectively, outperforming all prior approaches in these industrial settings. Notably, even when well-lit images are unavailable, our model maintains high performance using Retinexformer-enhanced low-light images, demonstrating its adaptability to real-world low-light scenarios. Additionally, we introduce a new industrial anomaly detection dataset featuring paired well-lit and low-light images. To our knowledge, this is the first dataset for LL-IAD dataset.
Graphical Abstract
Graphical Abstract
Industrial anomaly detection using paired well-lit and low-light images.
Journal Article
Design of Electrode Placement for Presenting Phosphenes in the Lower Visual Field Based on Electric Field Simulation
2021
Presenting visual information, called phosphenes, is a critical method for providing information on the position of obstacles for users of walking support tools for the visually impaired. A previous study has established a method for presenting phosphenes to the right, center, and left of the visual field. However, a method for presenting information on the position of obstacles around the feet using phosphenes, which is essential for the visually impaired, has not been clarified. Therefore, in this study, a method for presenting phosphenes in the lower visual field is presented, towards the aim of realizing a safe walking support tool. Electrode placement is proposed in this paper for the presentation of phosphenes to the right, center, and left of the lower visual field based on the electrode placement method used in the previous study, which presents the phosphene in three locations of the visual field. In addition, electric field simulation is performed, focusing on the electric field value on the eyeball surface, in order to observe whether the proposed electrode placement is able to stimulate the intended region. As a result, it is shown that the intended region on the eyeball surface can be stimulated locally with each of the proposed electrode placements.
Journal Article
Simulation-Based Designing of Suitable Stimulation Factors for Presenting Two Phosphenes Simultaneously to Lower Side of Field of View
2022
Using a phosphene has been discussed as a means of informing the visually impaired of the position of an obstacle. Obstacles underfoot have a risk, so it is necessary to inform the visually impaired. A previous study clarified a method of presenting phosphene in three directions in the lower vision; however, the simultaneous presentation of these phosphenes has not been discussed. Another study discussing the effect of electrical interference when stimulating the eyeball with multiple electrodes indicated that it is important to select appropriate stimulation factors to avoid this effect. However, when the stimulation electrodes are arranged remarkably close, there is a high possibility that the stimulus factor presented in the previous study will not apply. In this study, a method for simultaneously presenting phosphenes in the lower vision is presented. The electrode arrangements reported in the previous study to present phosphene in the lower field of vision are used, and the difficulty in the simultaneous presentation of multiple phosphenes in the lower vision is the focus. In this paper, the method of designing the stimulation factors is discussed numerically when the electrodes are arranged remarkably close. As a result, it is shown that stimulation factors different from the previous research were appropriate depending on the distance between the electrodes.
Journal Article
Enabling Self-Practice of Digital Audio–Tactile Maps for Visually Impaired People by Large Language Models
by
Manami Kanamaru
,
Chanh Minh Tran
,
Phan Xuan Tan
in
Analysis
,
Audio equipment
,
Computational linguistics
2024
Digital audio–tactile maps (DATMs) on touchscreen devices provide valuable opportunities for people who are visually impaired (PVIs) to explore the spatial environment for engaging in travel activities. Existing solutions for DATMs usually require extensive training for the PVIs to understand the feedback mechanism. Due to the shortage of human resources for training specialists, as well as PVIs’ desire for frequent practice to maintain their usage skills, it has become challenging to widely adopt DATMs in real life. This paper discusses the use of large language models (LLMs) to provide a verbal evaluation of the PVIs’ perception, which is crucial for the independent practice of DATM usage. A smartphone-based prototype providing DATMs of simple floor plans was developed for a preliminary investigation. The evaluation results have proven that the interaction with the LLM could help the participants better understand the DATMs’ content and could vividly replicate them by drawings.
Journal Article