Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
120
result(s) for
"computer vision subsystem"
Sort by:
Machine-learning-based system for multi-sensor 3D localisation of stationary objects
by
Berz, Everton L.
,
Hessel, Fabiano P.
,
Tesch, Deivid A.
in
Accuracy
,
ANN models
,
Artificial neural networks
2018
Localisation of objects and people in indoor environments has been widely studied due to security issues and because of the benefits that a localisation system can provide. Indoor positioning systems (IPSs) based on more than one technology can improve localisation performance by leveraging the advantages of distinct technologies. This study proposes a multi-sensor IPS able to estimate the three-dimensional (3D) location of stationary objects using off-the-shelf equipment. By using radio-frequency identification (RFID) technology, machine-learning models based on support vector regression (SVR) and artificial neural networks (ANNs) are proposed. A k-means technique is also applied to improve accuracy. A computer vision (CV) subsystem detects visual markers in the scenario to enhance RFID localisation. To combine the RFID and CV subsystems, a fusion method based on the region of interest is proposed. We have implemented the authors’ system and evaluated it using real experiments. On bi-dimensional scenarios, localisation error is between 9 and 29 cm in the range of 1 and 2.2 m. In a machine-learning approach comparison, ANN performed 31% better than SVR approach. Regarding 3D scenarios, localisation errors in dense environments are 80.7 and 73.7 cm for ANN and SVR models, respectively.
Journal Article
An oscillator-based smooth real-time estimate of gait phase for wearable robotics
by
Parri, Andrea
,
Ronsse, Renaud
,
Virginia Ruiz Garate
in
Algorithms
,
Error compensation
,
Error detection
2017
This paper presents a novel methodology for estimating the gait phase of human walking through a simple sensory apparatus. Three subsystems are combined: a primary phase estimator based on adaptive oscillators, a desired gait event detector and a phase error compensator. The estimated gait phase is expected to linearly increase from 0 to 2\\[\\pi \\] rad in one stride and remain continuous also when transiting to the next stride. We designed two experimental scenarios to validate this gait phase estimator, namely treadmill walking at different speeds and free walking. In the case of treadmill walking, the maximum phase error at the desired gait events was found to be 0.155 rad, and the maximum phase difference between the end of the previous stride and beginning of the current stride was 0.020 rad. In the free walking trials, phase error at the desired gait event was never larger than 0.278 rad. Our algorithm outperformed against two other benchmarked methods. The good performance of our gait phase estimator could provide consistent and finely tuned assistance for an exoskeleton designed to augment the mobility of patients.
Journal Article
Binocular vision and priori data based intelligent pose measurement method of large aerospace cylindrical components
by
Cao, Yansheng
,
Fan, Wei
,
Zhang, Jieru
in
Advanced manufacturing technologies
,
Algorithms
,
Assembly
2024
In the robot finishing process of the assembly interface of large aerospace cylindrical components (short for assembly interface), to realize the high-precision and high-efficiency pose perception of the large component, an intelligent pose measurement method for the large component is proposed based on binocular vision and priori data. In this method, the global coordinate system of the robot finishing system is initially established using laser tracking measurement method and customized reference plates, giving a unified coordinate transformation datum for the interoperation of the finishing system's subsystems. Then, utilizing deep learning and digital image processing technologies, an algorithm for recognizing and locating key features of the large component is developed, which can realize the identification of key feature types and accurate localization of feature centroids. Following that, the global coordinate of the key feature centroid is determined using the proposed binocular vision three-dimensional (3D) coordinate reconstruction method. Meanwhile, by introducing the priori processing data of the large component to match the 3D reconstruction coordinates of the key feature centroids, the spatial pose of the large component can be calculated with high precision. Finally, the proposed method is experimentally validated using a case study of a large aerospace cylindrical component. Experimental results prove that the proposed method can achieve high-precision pose measurement of the large component, which can provide pose data support for the adjustment or modification of the cutting path of the robot that is generated by the as-designed model of the large component, to ensure the correctness of the robotic machining of the assembly interface, and thus the proposed method can meet the robot finishing needs of the large component.
Journal Article
Real-Time Smart Parking Systems Integration in Distributed ITS for Smart Cities
2018
Intelligent Transportation Systems (ITS) have evolved as a key research topic in recent years, revolutionizing the overall traffic and travel experience by providing a set of advanced services and applications. These data-driven services contribute to mitigate major problems arising from the ever growing need of transport in our daily lives. Despite the progress, there is still need for an enhanced and distributed solution that can exploit the data from the available systems and provide an appropriate and real-time reaction on transportation systems. Therefore, in this paper, we present a new architecture where the intelligence is distributed and the decisions are decentralized. The proposed architecture is scalable since the incremental addition of new peripheral subsystems is supported by the introduction of gateways which requires no reengineering of the communication infrastructure. The proposed architecture is deployed to tackle the problem of traffic management inefficiency in urban areas, where traffic load is substantially increased, by vehicles moving around unnecessarily, to find a free parking space. This can be significantly reduced through the availability and diffusion of local information regarding vacant parking slots to drivers in a given area. Two types of parking systems, magnetic and vision sensor based, have been introduced, deployed, and tested in different scenarios. The effectiveness of the proposed architecture, together with the proposed algorithms, is assessed in field trials.
Journal Article
Enhanced descriptive captioning model for histopathological patches
by
Elbedwehy, Samar
,
Medhat, T.
,
Alrahmawy, Mohammed F.
in
1230: Sentient Multimedia Systems and Visual Intelligence
,
Artificial intelligence
,
Computer Communication Networks
2024
The interpretation of medical images into a natural language is a developing field of artificial intelligence (AI) called image captioning. This field integrates two branches of artificial intelligence which are computer vision and natural language processing. This is a challenging topic that goes beyond object recognition, segmentation, and classification since it demands an understanding of the relationships between various components in an image and how these objects function as visual representations. The content-based image retrieval (CBIR) uses an image captioning model to generate captions for the user query image. The common architecture of medical image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem. We aim in this paper to build an optimized model for histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens. For the image feature extraction subsystem, we did two evaluations; first, we tested 5 different vision models (VGG, ResNet, PVT, SWIN-Large, and ConvNEXT-Large) using (LSTM, RNN, and bidirectional-RNN) and then compare the vision models with (LSTM-without augmentation, LSTM-with augmentation and BioLinkBERT-Large as an embedding layer-with augmentation) to find the accurate one. Second, we tested 3 different concatenations of pairs of vision models (SWIN-Large, PVT_v2_b5, and ConvNEXT-Large) to get among them the most expressive extracted feature vector of the image. For the caption generation lingual subsystem, we tested a pre-trained language embedding model which is BioLinkBERT-Large compared to LSTM in both evaluations, to select from them the most accurate model. Our experiments showed that building a captioning system that uses a concatenation of the two models ConvNEXT-Large and PVT_v2_b5 as an image feature extractor, combined with the BioLinkBERT-Large language embedding model produces the best results among the other combinations.
Journal Article
Fault detection and classification in automated assembly machines using machine vision
2017
Automated assembly machines operate continuously to achieve high production rates. Continuous operation increases the potential for faults such as jams, missing parts, and electromechanical failures of subsystems. The goal of this research project was to develop and validate a machine vision inspection (MVI) system to detect and classify multiple faults using a single camera as a sensor. An industrial automated O-ring assembly machine that places O-rings on to continuously moving plastic carriers at a rate of over 100 assemblies per minute was modified to serve as the test apparatus. An industrial camera with LED panel lights for illumination was used to acquire videos of the machine’s operation. A programmable logic controller (PLC) with a human-machine interface (HMI) allowed for the generation of faults in a controlled fashion. Three MVI methods, based on computer vision techniques available in the literature, were developed for this application. The methods used features extracted from the videos to classify the machine’s condition. The first method was based on Gaussian mixture models (GMMs); the second method used an optical flow approach; and the third method was based on running average and morphological image processing operations. In order to provide a single metric to quantify relative performance, a machine vision performance index (MVPI) was developed with five measures of performance: accuracy, processing time, speed of response, robustness against noise, and ease of tuning. The MVPI for the three MVI methods is reported along with the significance of the results.
Journal Article
Vision-based drone control for autonomous UAV cinematography
by
Pitas, Ioannis
,
Symeonidis, Charalampos
,
Tefas, Anastasios
in
Cameras
,
Cinematography
,
Computer Communication Networks
2024
One of the most important aesthetic concepts in autonomous Unmanned Aerial Vehicle (UAV) cinematography is the UAV/Camera Motion Type (CMT), describing the desired UAV trajectory relative to a (still or moving) physical target/subject being filmed. Usually, for the drone to autonomously execute such a CMT and capture the desired shot in footage, the 3D states (positions/poses within the world) of both the UAV/camera and the target are required as input. However, the target’s 3D state is not typically known in non-staged settings. This paper proposes a novel framework for reformulating each desired CMT as a set of requirements that interrelate 2D visual information, UAV trajectory and camera orientation. Then, a set of CMT-specific vision-driven Proportional-Integral-Derivative (PID) UAV controllers can be implemented, by exploiting the above requirements to form suitable error signals. Such signals drive continuous adjustments to instant UAV motion parameters, separately at each captured video frame/time instance. The only inputs required for computing each error value are the current 2D pixel coordinates of the target’s on-frame bounding box, detectable by an independent, off-the-shelf, real-time, deep neural 2D object detector/tracker vision subsystem. Importantly, neither UAV nor target 3D states are required ever to be known or estimated, while no depth maps, target 3D models or camera intrinsic parameters are necessary. The method was implemented and successfully evaluated in a robotics simulator, by properly reformulating a set of standard, formalized UAV CMTs.
Journal Article
Improved Arabic image captioning model using feature concatenation with pre-trained word embedding
2023
Automatic captioning of images contributes to identifying features of multimedia content and helps in the detection of interesting patterns, trends, and occurrences. English image captioning has recently made incredible progress; however, Arabic image captioning is still lagging. In the field of machine learning, Arabic image-caption generation is generally a very difficult problem. This paper presents a more accurate model for Arabic image captioning by using transformer models in both the encoder and decoder phases as feature extractors from images in the encoder phase and a pre-trained word embedding model in the decoder phase. The models are demonstrated, and all of them are implemented, trained, and tested on Arabic Flickr8k datasets. For the image feature extraction subsystem, we compared using three different individual vision models (SWIN, XCIT, and ConvNexT) with concatenation to get among them the most expressive extracted feature vector of the image, and for the caption generation lingual subsystem, which is tested by four different pre-trained language embedding models: (ARABERT, ARAELECTRA, MARBERTv2, and CamelBERT), to select from them the most accurate pre-trained language embedding model. Our experiments showed that building an Arabic image captioning system that uses a concatenation of the three transformer-based models ConvNexT combined with SWIN and XCIT as an image feature extractor, combined with the CamelBERT language embedding model produces the best results among the other combinations, having scores of 0.5980 with BLEU-1 and with ConvNexT combined with SWIN the araelectra language embedding model having a score of 0.1664 with BLEU-4 which are higher than the previously reported values of 0.443 and 0.157.
Journal Article
A comparative evaluation of convolutional neural networks, training image sizes, and deep learning optimizers for weed detection in alfalfa
2022
In this research, the deep-learning optimizers Adagrad, AdaDelta, Adaptive Moment Estimation (Adam), and Stochastic Gradient Descent (SGD) were applied to the deep convolutional neural networks AlexNet, GoogLeNet, VGGNet, and ResNet that were trained to recognize weeds among alfalfa using photographic images taken at 200×200, 400×400, 600×600, and 800×800 pixels. An increase in the image sizes reduced the classification accuracy of all neural networks. The neural networks that were trained with images of 200×200 pixels resulted in better classification accuracy than the other image sizes investigated here. The optimizers AlexNet and GoogLeNet trained with AdaDelta and SGD outperformed the Adagrad and Adam optimizers; VGGNet trained with AdaDelta outperformed Adagrad, Adam, and SGD; and ResNet trained with AdaDelta and Adagrad outperformed the Adam and SGD optimizers. When the neural networks were trained with the best-performing input image size (200×200 pixels) and the best-performing deep learning optimizer, VGGNet was the most effective neural network, with high precision and recall values (≥0.99) when validation and testing datasets were used. Alternatively, ResNet was the least effective neural network in its ability to classify images containing weeds. However, there was no difference among the different neural networks in their ability to differentiate between broadleaf and grass weeds. The neural networks discussed herein may be used for scouting weed infestations in alfalfa and further integrated into the machine vision subsystem of smart sprayers for site-specific weed control. Nomenclature: Alfalfa, Medicago sativa L.
Journal Article
FAformer: parallel Fourier-attention architectures benefits EEG-based affective computing with enhanced spatial information
by
Chen, Jianhui
,
Zhou, Haiyan
,
Gao, Ziheng
in
Affective computing
,
Artificial Intelligence
,
Channels
2024
The balance of brain functional segregation (i.e., the process in specialized local subsystems) and integration (i.e., the process in global cooperation of the subsystems) is crucial for cognition in human beings, and many deep learning models have been used to evaluate the spatial information during EEG-based affective computing. However, acquiring the intrinsic spatial representation in the topology of EEG channels is still challenging. To further address the issue, we propose the FAformer to enhance spatial information in EEG signals with parallel-branch architectures based on a vision transformer (ViT). In the encoder, there is a branch that utilizes Adaptive Neural Fourier Operators (AFNO) to model global spatial patterns using the Fourier transform in the electrode channel dimension. The other branch utilizes multi-head self-attention (MSA) to explore the dependence of emotion on different channels, which is conducive to building key local networks. Additionally, a self-supervised learning (SSL) task of adaptive feature dissociation (AdaptiveFD) is developed to improve the distinctiveness of spatial features generated from the parallel branches and guarantee robustness in different subjects. FAformer achieves superior performance over the competitive models on the DREAMER and DEAP. Moreover, the rationality and hyperparameters analysis are conducted to demonstrate the effectiveness of the FAformer. Finally, the visualization of features reveals the spatial global connections and key local patterns during the deep learning process in FAformer, which benefits EEG-based affective computing.
Journal Article