Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
17 result(s) for "hard example mining"
Sort by:
A CNN-Based Method of Vehicle Detection from Aerial Images Using Hard Example Mining
Recently, deep learning techniques have had a practical role in vehicle detection. While much effort has been spent on applying deep learning to vehicle detection, the effective use of training data has not been thoroughly studied, although it has great potential for improving training results, especially in cases where the training data are sparse. In this paper, we proposed using hard example mining (HEM) in the training process of a convolutional neural network (CNN) for vehicle detection in aerial images. We applied HEM to stochastic gradient descent (SGD) to choose the most informative training data by calculating the loss values in each batch and employing the examples with the largest losses. We picked 100 out of both 500 and 1000 examples for training in one iteration, and we tested different ratios of positive to negative examples in the training data to evaluate how the balance of positive and negative examples would affect the performance. In any case, our method always outperformed the plain SGD. The experimental results for images from New York showed improved performance over a CNN trained in plain SGD where the F1 score of our method was 0.02 higher.
Hardest and semi-hard negative pairs mining for text-based person search with visual–textual attention
Searching persons in large-scale image databases with the query of natural language description is a more practical and important application in video surveillance. Intuitively, for person search, the core issue should be the visual–textual association, which is still an extremely challenging task, due to the contradiction between the high abstraction of textual description and the intuitive expression of visual images. In this paper, aim for more consistent visual–textual features and better inter-class discriminate ability, we propose a text-based person search approach with visual–textual attention on the hardest and semi-hard negative pairs mining. First, for the visual and textual attentions, we designed a Smoothed Global Maximum Pooling (SGMP) to extract more concentrated visual features, and also the memory attention based on LSTM’s cell unit for more strictly correspondence matching. Second, while we only have labeled positive pairs, more valuable negative pairs are mined by defining the cross-modality-based hardest and semi-hard negative pairs. After that, we combine the triplet loss on the single modality with the hardest negative pairs, and the cross-entropy loss on cross-modalities with both the hardest and semi-hard negative pairs, to train the whole network. Finally, to evaluate the effectiveness and feasibility of the proposed approach, we conduct extensive experiments on the typical person search dataset: CUHK-PEDES, in which our approach achieves satisfactory performance, e.g, the top1 accuracy of 55.32 % . Besides, we also evaluate the semi-hard pair mining method in the COCO caption dataset and validate its effectiveness and complementary.
Pointer Defect Detection Based on Transfer Learning and Improved Cascade-RCNN
To meet the practical needs of detecting various defects on the pointer surface and solve the difficulty of detecting some defects on the pointer surface, this paper proposes a transfer learning and improved Cascade-RCNN deep neural network (TICNET) algorithm for detecting pointer defects. Firstly, the convolutional layers of ResNet-50 are reconstructed by deformable convolution, which enhances the learning of pointer surface defects by feature extraction network. Furthermore, the problems of missing detection caused by internal differences and weak features are effectively solved. Secondly, the idea of online hard example mining (OHEM) is used to improve the Cascade-RCNN detection network, which achieve accurate classification of defects. Finally, based on the fact that common pointer defect dataset and pointer defect dataset established in this paper have the same low-level visual characteristics. The network is pre-trained on the common defect dataset, and weights are transferred to the defect dataset established in this paper, which reduces the training difficulty caused by too few data. The experimental results show that the proposed method achieves a 0.933 detection rate and a 0.873 mean average precision when the threshold of intersection over union is 0.5, and it realizes high precision detection of pointer surface defects.
Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining
Detecting vehicles in aerial imagery plays an important role in a wide range of applications. The current vehicle detection methods are mostly based on sliding-window search and handcrafted or shallow-learning-based features, having limited description capability and heavy computational costs. Recently, due to the powerful feature representations, region convolutional neural networks (CNN) based detection methods have achieved state-of-the-art performance in computer vision, especially Faster R-CNN. However, directly using it for vehicle detection in aerial images has many limitations: (1) region proposal network (RPN) in Faster R-CNN has poor performance for accurately locating small-sized vehicles, due to the relatively coarse feature maps; and (2) the classifier after RPN cannot distinguish vehicles and complex backgrounds well. In this study, an improved detection method based on Faster R-CNN is proposed in order to accomplish the two challenges mentioned above. Firstly, to improve the recall, we employ a hyper region proposal network (HRPN) to extract vehicle-like targets with a combination of hierarchical feature maps. Then, we replace the classifier after RPN by a cascade of boosted classifiers to verify the candidate regions, aiming at reducing false detection by negative example mining. We evaluate our method on the Munich vehicle dataset and the collected vehicle dataset, with improvements in accuracy and robustness compared to existing methods.
End-to-End Airport Detection in Remote Sensing Images Combining Cascade Region Proposal Networks and Multi-Threshold Detection Networks
Fast and accurate airport detection in remote sensing images is important for many military and civilian applications. However, traditional airport detection methods have low detection rates, high false alarm rates and slow speeds. Due to the power convolutional neural networks in object-detection systems, an end-to-end airport detection method based on convolutional neural networks is proposed in this study. First, based on the common low-level visual features of natural images and airport remote sensing images, region-based convolutional neural networks are chosen to conduct transfer learning for airport images using a limited amount of data. Second, to further improve the detection rate and reduce the false alarm rate, the concepts of “divide and conquer” and “integral loss’’ are introduced to establish cascade region proposal networks and multi-threshold detection networks, respectively. Third, hard example mining is used to improve the object discrimination ability and the training efficiency of the network during sample training. Additionally, a cross-optimization strategy is employed to achieve convolution layer sharing between the cascade region proposal networks and the subsequent multi-threshold detection networks, and this approach significantly decreases the detection time. The results show that the method established in this study can accurately detect various types of airports in complex backgrounds with a higher detection rate, lower false alarm rate, and shorter detection time than existing airport detection methods.
Online hard example mining vs. fixed oversampling strategy for segmentation of new multiple sclerosis lesions from longitudinal FLAIR MRI
Detecting new lesions is a key aspect of the radiological follow-up of patients with Multiple Sclerosis (MS), leading to eventual changes in their therapeutics. This paper presents our contribution to the MSSEG-2 MICCAI 2021 challenge. The challenge is focused on the segmentation of new MS lesions using two consecutive Fluid Attenuated Inversion Recovery (FLAIR) Magnetic Resonance Imaging (MRI). In other words, considering longitudinal data composed of two time-points as input, the aim is to segment the lesional areas which are present only on the follow-up scan and not on the baseline. The backbone of our segmentation method is a 3D UNet applied patch-wise to the images and in which, in order to take into account both time-points, we simply concatenate the baseline and follow-up images along the channel axis before passing them to the 3D UNet. Our key methodological contribution is the use of online hard example mining to address the challenge of class imbalance. Indeed, there are very few voxels belonging to new lesions which makes training deep learning models difficult. Instead of using handcrafted priors like brain masks or multi-stage methods, we experiment with a novel modification to online hard example mining (OHEM), where we use an exponential moving average (i.e., its weights are updated with momentum) of the 3D UNet to mine hard examples. Using a moving average instead of the raw model should allow smoothing its predictions and allowing it to give more consistent feedback for OHEM.
Person Re-identification with pose variation aware data augmentation
Person re-identification (Re-ID) aims to match a person of interest across multiple non-overlapping camera views. This is a challenging task, partly because a person captured in surveillance video often undergoes intense pose variations. Consequently, differences in their appearance are typically obvious. In this paper, we propose a pose variation aware data augmentation ( PA 4 ) method, which is composed of a pose transfer generative adversarial network (PTGAN) and person re-identification with improved hard example mining (Pre-HEM). Specifically, PTGAN introduces a similarity measurement module to synthesize realistic person images that are conditional on the pose, and with the original images, form an augmented training dataset. Pre-HEM presents a novel method of using the pose-transferred images with the learned pose transfer model for person Re-ID. It replaces the invalid samples that are caused by pose variations and constrains the proportion of the pose-transferred samples in each mini-batch. We conduct extensive comparative evaluations to demonstrate the advantages and superiority of our proposed method over state-of-the-art approaches on Market-1501, DukeMTMC-reID, and CUHK03 dataset.
Airport Detection Using End-to-End Convolutional Neural Network with Hard Example Mining
Deep convolutional neural network (CNN) achieves outstanding performance in the field of target detection. As one of the most typical targets in remote sensing images (RSIs), airport has attracted increasing attention in recent years. However, the essential challenge for using deep CNN to detect airport is the great imbalance between the number of airports and background examples in large-scale RSIs, which may lead to over-fitting. In this paper, we develop a hard example mining and weight-balanced strategy to construct a novel end-to-end convolutional neural network for airport detection. The initial motivation of the proposed method is that backgrounds contain an overwhelming number of easy examples and a few hard examples. Therefore, we design a hard example mining layer to automatically select hard examples by their losses, and implement a new weight-balanced loss function to optimize CNN. Meanwhile, the cascade design of proposal extraction and object detection in our network releases the constraint on input image size and reduces spurious false positives. Compared with geometric characteristics and low-level manually designed features, the hard example mining based network could extract high-level features, which is more robust for airport detection in complex environment. The proposed method is validated on a multi-scale dataset with complex background collected from Google Earth. The experimental results demonstrate that our proposed method is robust, and superior to the state-of-the-art airport detection models.
Improved Multi-Person 2D Human Pose Estimation Using Attention Mechanisms and Hard Example Mining
In recent years, human pose estimation, as a subfield of computer vision and artificial intelligence, has achieved significant performance improvements due to its wide applications in human-computer interaction, virtual reality, and smart security. However, most existing methods are designed for single-person scenes and suffer from low accuracy and long inference time in multi-person scenes. To address this issue, increasing attention has been paid to developing methods for multi-person pose estimation, such as utilizing Partial Affinity Field (PAF)-based bottom-up methods to estimate 2D poses of multiple people. In this study, we propose a method that addresses the problems of low network accuracy and poor estimation of flexible joints. This method introduces the attention mechanism into the network and utilizes the joint point extraction method based on hard example mining. Integrating the attention mechanism into the network improves its overall performance. In contrast, the joint point extraction method improves the localization accuracy of the flexible joints of the network without increasing the complexity. Experimental results demonstrate that our proposed method significantly improves the accuracy of 2D human pose estimation. Our network achieved a notably elevated Average Precision (AP) score of 60.0 and outperformed competing methods on the standard benchmark COCO test dataset, signifying its exceptional performance.
Bone Metastasis Detection in the Chest and Pelvis from a Whole-Body Bone Scan Using Deep Learning and a Small Dataset
The aim of this study was to establish an early diagnostic system for the identification of the bone metastasis of prostate cancer in whole-body bone scan images by using a deep convolutional neural network (D-CNN). The developed system exhibited satisfactory performance for a small dataset containing 205 cases, 100 of which were of bone metastasis. The sensitivity and precision for bone metastasis detection and classification in the chest were 0.82 ± 0.08 and 0.70 ± 0.11, respectively. The sensitivity and specificity for bone metastasis classification in the pelvis were 0.87 ± 0.12 and 0.81 ± 0.11, respectively. We propose the use of hard example mining for increasing the sensitivity and precision of the chest D-CNN. The developed system has the potential to provide a prediagnostic report for physicians’ final decisions.