Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
307 result(s) for "Feng, Jiashi"
Sort by:
Fine-Grained Multi-human Parsing
Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans in crowded scenes, such as group behavior analysis, person re-identification, e-commerce, media editing, video surveillance, autonomous driving and virtual reality, etc. To perform well, models need to comprehensively perceive the semantic information and the differences between instances in a multi-human image, which is recently defined as the multi-human parsing task. In this paper, we first present a new large-scale database “Multi-human Parsing (MHP v2.0)” for algorithm development and evaluation to advance the research on understanding humans in crowded scenes. MHP v2.0 contains 25,403 elaborately annotated images with 58 fine-grained semantic category labels and 16 dense pose key point labels, involving 2–26 persons per image captured in real-world scenes from various viewpoints, poses, occlusion, interactions and background. We further propose a novel deep Nested Adversarial Network (NAN) model for multi-human parsing. NAN consists of three Generative Adversarial Network-like sub-nets, respectively performing semantic saliency prediction, instance-agnostic parsing and instance-aware clustering. These sub-nets form a nested structure and are carefully designed to learn jointly in an end-to-end way. NAN consistently outperforms existing state-of-the-art solutions on our MHP and several other datasets, including MHP v1.0, PASCAL-Person-Part and Buffy. NAN serves as a strong baseline to shed light on generic instance-level semantic part prediction and drive the future research on multi-human parsing. With the above innovations and contributions, we have organized the CVPR 2018 Workshop on Visual Understanding of Humans in Crowd Scene (VUHCS 2018) and the Fine-Grained Multi-human Parsing and Pose Estimation Challenge. These contributions together significantly benefit the community. Code and pre-trained models are available at https://github.com/ZhaoJ9014/Multi-Human-Parsing_MHP.
A survey on deep learning-based fine-grained object classification and semantic segmentation
The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning techniques bring encouraging performance to fine-grained image classification which aims to distinguish subordinate-level categories, such as bird species or dog breeds. This task is extremely challenging due to high intra-class and low inter-class variance. In this paper, we review four types of deep learning based fine-grained image classification approaches, including the general convolutional neural networks (CNNs), part detection based, ensemble of networks based and visual attention based fine-grained image classification approaches. Besides, the deep learning based semantic segmentation approaches are also covered in this paper. The region proposal based and fully convolutional networks based approaches for semantic segmentation are introduced respectively.
Predicting Alzheimer's disease progression using deep recurrent neural networks
Early identification of individuals at risk of developing Alzheimer's disease (AD) dementia is important for developing disease-modifying therapies. In this study, given multimodal AD markers and clinical diagnosis of an individual from one or more timepoints, we seek to predict the clinical diagnosis, cognition and ventricular volume of the individual for every month (indefinitely) into the future. We proposed and applied a minimal recurrent neural network (minimalRNN) model to data from The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge, comprising longitudinal data of 1677 participants (Marinescu et al., 2018) from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We compared the performance of the minimalRNN model and four baseline algorithms up to 6 years into the future. Most previous work on predicting AD progression ignore the issue of missing data, which is a prevalent issue in longitudinal data. Here, we explored three different strategies to handle missing data. Two of the strategies treated the missing data as a “preprocessing” issue, by imputing the missing data using the previous timepoint (“forward filling”) or linear interpolation (“linear filling). The third strategy utilized the minimalRNN model itself to fill in the missing data both during training and testing (“model filling”). Our analyses suggest that the minimalRNN with “model filling” compared favorably with baseline algorithms, including support vector machine/regression, linear state space (LSS) model, and long short-term memory (LSTM) model. Importantly, although the training procedure utilized longitudinal data, we found that the trained minimalRNN model exhibited similar performance, when using only 1 input timepoint or 4 input timepoints, suggesting that our approach might work well with just cross-sectional data. An earlier version of our approach was ranked 5th (out of 53 entries) in the TADPOLE challenge in 2019. The current approach is ranked 2nd out of 63 entries as of June 3rd, 2020.
Revisiting Horizontal Stratification in Higher Education: College Prestige Hierarchy and Educational Assortative Mating in China
Existing research on assortative mating has examined marriage between people with different levels of education, yet heterogeneity in educational assortative mating outcomes of college graduates has been mostly ignored. Using data from the 2010 Chinese Family Panel Study and log-multiplicative models, this study examines the changing structure and association of husbands' and wives' educational attainment between 1980 and 2010, a period in which Chinese higher education experienced rapid expansion and stratification. Results show that the graduates of first-tier institutions are less likely than graduates of lower-ranked colleges to marry someone without a college degree. Moreover, from 1980 to 2010, female first-tier-college graduates were increasingly more likely to marry people who graduated from similarly prestigious colleges, although there is insufficient evidence to draw the same conclusion about their male counterparts. This study thus demonstrates the extent of heterogeneity in educational assortative mating patterns among college graduates and the tendency for elite college graduates to marry within the educational elite.
Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics
There is significant interest in the development and application of deep neural networks (DNNs) to neuroimaging data. A growing literature suggests that DNNs outperform their classical counterparts in a variety of neuroimaging applications, yet there are few direct comparisons of relative utility. Here, we compared the performance of three DNN architectures and a classical machine learning algorithm (kernel regression) in predicting individual phenotypes from whole-brain resting-state functional connectivity (RSFC) patterns. One of the DNNs was a generic fully-connected feedforward neural network, while the other two DNNs were recently published approaches specifically designed to exploit the structure of connectome data. By using a combined sample of almost 10,000 participants from the Human Connectome Project (HCP) and UK Biobank, we showed that the three DNNs and kernel regression achieved similar performance across a wide range of behavioral and demographic measures. Furthermore, the generic feedforward neural network exhibited similar performance to the two state-of-the-art connectome-specific DNNs. When predicting fluid intelligence in the UK Biobank, performance of all algorithms dramatically improved when sample size increased from 100 to 1000 subjects. Improvement was smaller, but still significant, when sample size increased from 1000 to 5000 subjects. Importantly, kernel regression was competitive across all sample sizes. Overall, our study suggests that kernel regression is as effective as DNNs for RSFC-based behavioral prediction, while incurring significantly lower computational costs. Therefore, kernel regression might serve as a useful baseline algorithm for future studies.
Recognizing Profile Faces by Imagining Frontal View
Extreme pose variation is one of the key obstacles to accurate face recognition in practice. Compared with current techniques for pose-invariant face recognition, which either expect pose invariance from hand-crafted features or data-driven deep learning solutions, or first normalize profile face images to frontal pose before feature extraction, we argue that it is more desirable to perform both tasks jointly to allow them to benefit from each other. To this end, we propose a Pose-Invariant Model (PIM) for face recognition in the wild, with three distinct novelties. First, PIM is a novel and unified deep architecture, containing a Face Frontalization sub-Net (FFN) and a Discriminative Learning sub-Net (DLN), which are jointly learned from end to end. Second, FFN is a well-designed dual-path Generative Adversarial Network which simultaneously perceives global structures and local details, incorporating an unsupervised cross-domain adversarial training and a meta-learning (“learning to learn”) strategy using siamese discriminator with dynamic convolution for high-fidelity and identity-preserving frontal view synthesis. Third, DLN is a generic Convolutional Neural Network (CNN) for face recognition with our enforced cross-entropy optimization strategy for learning discriminative yet generalized feature representations with large intra-class affinity and inter-class separability. Qualitative and quantitative experiments on both controlled and in-the-wild benchmark datasets demonstrate the superiority of the proposed model over the state-of-the-arts.
Stray Light Suppression Design and Test for the Jilin-1 GF04A Satellite Remote Sensing Camera
The stray light suppression design aims to reduce the impact of stray light on optical systems. For high-resolution optical remote sensing systems, practical tests of stray light suppression performance are essential to ensure optimal functionality. However, due to system complexity and spatial constraints, physical test methods for evaluating the stray light suppression performance of large-aperture, long-focal-length remote sensing cameras remain scarce. To address this issue, a comprehensive test is conducted on the stray light suppression performance of the Jilin-1 GF04A satellite remote sensing camera by integrating multiple test methods, including the environmental light effect test, neighborhood point source response test, key surface response test, and sneak path of stray light test. The experimental results indicate that the stray light response ratios obtained from different test methods are all below 1%. The on-orbit performance of GF04A further validates the effectiveness of its stray light suppression design.
Finding any Waldo with zero-shot invariant and efficient visual search
Searching for a target object in a cluttered scene constitutes a fundamental challenge in daily vision. Visual search must be selective enough to discriminate the target from distractors, invariant to changes in the appearance of the target, efficient to avoid exhaustive exploration of the image, and must generalize to locate novel target objects with zero-shot training. Previous work on visual search has focused on searching for perfect matches of a target after extensive category-specific training. Here, we show for the first time that humans can efficiently and invariantly search for natural objects in complex scenes. To gain insight into the mechanisms that guide visual search, we propose a biologically inspired computational model that can locate targets without exhaustive sampling and which can generalize to novel objects. The model provides an approximation to the mechanisms integrating bottom-up and top-down signals during search in natural scenes. Visual search requires recognizing an object “invariantly”, despite changes in its appearance. Here, the authors show that humans can efficiently and invariantly search for objects in complex scenes and introduce a biologically-inspired zero-shot model that captures human eye movements during search.
Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms
Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms. In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016. Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated. Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity. While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.
Collaborative Linear Coding for Robust Image Classification
How to generate robust image representations, when there is contamination from noisy pixels within the images, is critical for boosting the performance of image classification methods. However, such an important problem is not fully explored yet. In this paper, we propose a novel image representation learning method, i.e. , collaborative linear coding (CLC), to alleviate the negative influence of noisy features in classifying images. Specifically, CLC exploits the correlation among local features in the coding procedure, in order to suppress the interference of noisy features via weakening their responses on coding basis. CLC implicitly divides the extracted local features into different feature subsets, and such feature allocation is indicated by the introduced latent variables. Within each subset, the features are ensured to be highly correlated, and the produced codes for them are encouraged to activate on the identical basis. Through incorporating such regularization in the coding model, the responses of noisy local features are dominated by the responses of informative features due to their rarity compared with the informative features. Thus the final image representation is more robust and distinctive for following classification, compared with the coding methods without considering such high order correlation. Though CLC involves a set of complicated optimization problems, we investigate the special structure of the problems and then propose an efficient alternative optimization algorithm. We verified the effectiveness and robustness of the proposed CLC on multiple image classification benchmark datasets, including Scene 15 , Indoor 67 , Flower 102 , Pet 37 , and PASCAL VOC 2011 . Compared with the well established baseline LLC, CLC consistently enhances the classification accuracy, especially for the images containing more noises.