Catalogue Search | MBRL

Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild

by Jin, Haibo , Liao Shengcai , Shao, Ling in Accuracy , Artificial intelligence , Curricula

2021

Recently, heatmap regression models have become popular due to their superior performance in locating facial landmarks. However, three major problems still exist among these models: (1) they are computationally expensive; (2) they usually lack explicit constraints on global shapes; (3) domain gaps are commonly present. To address these problems, we propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression, which conducts score and offset predictions simultaneously on low-resolution feature maps. By doing so, repeated upsampling layers are no longer necessary, enabling the inference time to be largely reduced without sacrificing model accuracy. Besides, a simple but effective neighbor regression module is proposed to enforce local constraints by fusing predictions from neighboring landmarks, which enhances the robustness of the new detection head. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum. This training strategy is able to mine more reliable pseudo-labels from unlabeled data across domains by starting with an easier task, then gradually increasing the difficulty to provide more precise labels. Extensive experiments demonstrate the superiority of PIPNet, which obtains new state-of-the-art results on three out of six popular benchmarks under the supervised setting. The results on two cross-domain test sets are also consistently improved compared to the baselines. Notably, our lightweight version of PIPNet runs at 35.7 FPS and 200 FPS on CPU and GPU, respectively, while still maintaining a competitive accuracy to state-of-the-art methods. The code of PIPNet is available at https://github.com/jhb86253817/PIPNet.

Journal Article

Share this book

Add to My Shelf

ORB Feature Uniform Distribution Algorithm Integrating Quadtree and Adaptive Thresholding

by Wang, Weiqing , Xiong, Zhonggang , Liu, Zhong in Algorithms , Pixels

2025

To enhance the uniform distribution characteristic of ORB extraction and leverage the global information of the image for improved positioning accuracy in visual SLAM, a novel ORB feature point extraction method combining quadtree and adaptive threshold was introduced. This method involves building a multi-resolution image pyramid and dividing the images into 30 pixels × 30 pixels grids to determine the number of feature points needed for each layer. The adaptive threshold for detecting feature points is derived from the image’s gray mean and variance. Feature points and their descriptors are then extracted using this adaptive threshold, followed by refinement through quadtree management and non-maximum suppression. Experimental data proves that the proposed solution enhances both the balanced distribution of ORB and the reliability of their tracking.

Journal Article

Share this book

Add to My Shelf

Identification of Burned Areas Based on Texture Features of Different Windows

by Yu, Wenying , Sun, Longyu , Wu, Jinwen in Classification , Physics , Pixels

2022

It is vital to choose appropriate pixel spacing as the window size determines the distribution of texture features in a gray-level co-occurrence matrix (GLCM). The window size ranged from 9×9 to 15×15 to keep the richness of texture image information. In this study, the importance evaluation method based on the Gini index was used to determine the appropriate window for the identification of burned areas, and burned areas were identified with the random forest classification method based on texture features. The results showed that 15×15 was the most appropriate texture window size for the identification of burned areas, and the best texture features for the identification of burned areas were the contrast of the visible blue light band and the texture mean of the near-infrared band; burned areas could be fully identified with the random forest classification method based on texture features, with an overall classification accuracy of 83.87%, but there would be interference pixels in reservoirs, pits, ponds, marshes and irrigated dry fields.

Journal Article

Share this book

Add to My Shelf

TCANet: Three-dimensional cross-attention mechanism for stereo-matching

by Li, Xin , Piao, Yan in Datasets , Modules , Pixels

2024

Effective disparity estimation is a current hotspot in stereo vision research, and cost aggregation is an important part of disparity prediction. How to perform more effective cost aggregation is the core step to improve the accuracy of disparity prediction. In previous studies, 3DCNN with a stacked hourglass structure is often used for cost aggregation. In this research, we propose an effective 3D cross-attention stereo network that utilizes the attention mechanism to obtain contextual information for cost aggregation in a more efficient way. Specifically, the 3D cross-attention module in TCANet acquires the geometric information of all pixels on the 3D cross path. By repeating this operation twice, each pixel can eventually obtain global dependencies from all other pixels. Using the 3D cross-attention module in a stacked hourglass-structured 3DCNN only increases the number of parameters by a very small amount, which can effectively improve the performance of the model. Experimental results show that TCANet performs well on virtual dataset Scene Flow and realistic KITTI datasets.

Journal Article

Share this book

Add to My Shelf

Active pixel sensor matrix based on monolayer MoS2 phototransistor array

by Pannone, Andrew , Trainor, Nicholas , Stepanoff, Sergei P. in 639/301/1005/1007 , 639/301/357/1018 , Active pixel sensors

2022

In-sensor processing, which can reduce the energy and hardware burden for many machine vision applications, is currently lacking in state-of-the-art active pixel sensor (APS) technology. Photosensitive and semiconducting two-dimensional (2D) materials can bridge this technology gap by integrating image capture (sense) and image processing (compute) capabilities in a single device. Here, we introduce a 2D APS technology based on a monolayer MoS 2 phototransistor array, where each pixel uses a single programmable phototransistor, leading to a substantial reduction in footprint (900 pixels in ∼0.09 cm 2 ) and energy consumption (100s of fJ per pixel). By exploiting gate-tunable persistent photoconductivity, we achieve a responsivity of ∼3.6 × 10 7 A W −1 , specific detectivity of ∼5.6 × 10 13 Jones, spectral uniformity, a high dynamic range of ∼80 dB and in-sensor de-noising capabilities. Further, we demonstrate near-ideal yield and uniformity in photoresponse across the 2D APS array. Low-power and compact active pixel sensor (APS) matrices are desired for resource-limited edge devices. Here, the authors report a small-footprint APS matrix based on monolayer MoS 2 phototransistors arrays exhibiting spectral uniformity, reconfigurable photoresponsivity and de-noising capabilities at low energy consumption.

Journal Article

Share this book

Add to My Shelf

Subpixel corner detection algorithm based on the plane chessboard

by Zhong, Pengfei , Liu, Tianli , Wang, Tao in Algorithms , Corner detection , Distortion

2024

In this paper, the pixel-level corner position extracted by the Harris algorithm is used as the initial value of sub-pixel corner detection, and then it is used to detect the sub-pixel corner detection. The sub-pixel corner detection algorithm based on the inner product of the gray gradient has weak anti-noise and poor anti-distortion ability, and the sub-pixel corner detection algorithm based on Gauss quadric surface fitting has a high dependence on the initial value. In this case, a quadric fitting based on an iterative search thought algorithm of subpixel corner detection is proposed in this paper. The experiment shows that the improved algorithm has a stronger ability in anti-noise and anti-distortion and low dependence on the initial value. Using this algorithm to calibrate the camera, the re-projection error is smaller and the calibration accuracy is higher.

Journal Article

Share this book

Add to My Shelf

Pixel-Wise Crowd Understanding via Synthetic Data

by Wang, Qi , Gao Junyu , Yuan, Yuan in Algorithms , Computer & video games , Computer vision

2021

Crowd analysis via computer vision techniques is an important topic in the field of video surveillance, which has wide-spread applications including crowd monitoring, public safety, space design and so on. Pixel-wise crowd understanding is the most fundamental task in crowd analysis because of its finer results for video sequences or still images than other analysis tasks. Unfortunately, pixel-level understanding needs a large amount of labeled training data. Annotating them is an expensive work, which causes that current crowd datasets are small. As a result, most algorithms suffer from over-fitting to varying degrees. In this paper, take crowd counting and segmentation as examples from the pixel-wise crowd understanding, we attempt to remedy these problems from two aspects, namely data and methodology. Firstly, we develop a free data collector and labeler to generate synthetic and labeled crowd scenes in a computer game, Grand Theft Auto V. Then we use it to construct a large-scale, diverse synthetic crowd dataset, which is named as “GCC Dataset”. Secondly, we propose two simple methods to improve the performance of crowd understanding via exploiting the synthetic data. To be specific, (1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; (2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels. As a result, the trained model works well in real crowd scenes.Extensive experiments verify that the supervision algorithm outperforms the state-of-the-art performance on four real datasets: UCF_CC_50, UCF-QNRF, and Shanghai Tech Part A/B Dataset. The above results show the effectiveness, values of synthetic GCC for the pixel-wise crowd understanding. The tools of collecting/labeling data, the proposed synthetic dataset and the source code for counting models are available at https://gjy3035.github.io/GCC-CL/.

Journal Article

Share this book

Add to My Shelf

Development of a circular-shape cathode element with pixel size of 0.8 x 0.8 mm2 for two-dimensional neutron detector

by Nakamura, T. , Sakasai, K. , Yamagishi, H. in Cathodes , Neutron counters , Neutron scattering

2025

We present the development and evaluation of a detector element for high-performance two-dimensional neutron detectors used in neutron scattering experimental facilities. A new detector element with a smaller pixel size, consisting of anode wires for gas amplification and circular-shaped cathode bumps for charge collection, was developed. Experimental results demonstrated the excellent performance of the detector system using the developed detector elements, with a counting error of 9.92% for all pixels in the two-dimensional image and a positional error of 1.79 mm at half-maximum. The developed element is expected to enable accurate neutron detection and reliability for long-term use with minimal maintenance.

Journal Article

Share this book

Add to My Shelf

High Dynamic Range Pixel Array Detector for Scanning Transmission Electron Microscopy

by Ralph, Daniel C. , Nguyen, Kayla X. , Philipp, Hugh T. in Data acquisition , Data collection , Design

2016

We describe a hybrid pixel array detector (electron microscope pixel array detector, or EMPAD) adapted for use in electron microscope applications, especially as a universal detector for scanning transmission electron microscopy. The 128×128 pixel detector consists of a 500 µm thick silicon diode array bump-bonded pixel-by-pixel to an application-specific integrated circuit. The in-pixel circuitry provides a 1,000,000:1 dynamic range within a single frame, allowing the direct electron beam to be imaged while still maintaining single electron sensitivity. A 1.1 kHz framing rate enables rapid data collection and minimizes sample drift distortions while scanning. By capturing the entire unsaturated diffraction pattern in scanning mode, one can simultaneously capture bright field, dark field, and phase contrast information, as well as being able to analyze the full scattering distribution, allowing true center of mass imaging. The scattering is recorded on an absolute scale, so that information such as local sample thickness can be directly determined. This paper describes the detector architecture, data acquisition system, and preliminary results from experiments with 80–200 keV electron beams.

Journal Article

Share this book

Add to My Shelf

Design and Verification of Single-Pixel Scanning Imaging System

by Ma, Lisha , Chen, Menglu in Image reconstruction , Infrared spectra , Pixels

2025

This study presents the design and experimental validation of a single-pixel mechanical scanning imaging system that integrates optical, mechanical, and computational imaging components. By coordinating the scanning module with a single-pixel detector, the system achieves high-fidelity image reconstruction. Experimental results demonstrate strong stability and excellent imaging performance across both visible and infrared spectra. Future work will focus on multispectral optimization and practical implementation scenarios, aiming to advance this technology's real-world applicability.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter