Catalogue Search | MBRL

Self-Adaptive Aspect Ratio Anchor for Oriented Object Detection in Remote Sensing Images

by Yin, Xu-Cheng , Hou, Jie-Bo , Zhu, Xiaobin in Accuracy , anchor , Aspect ratio

2021

Object detection is a significant and challenging problem in the study of remote sensing. Since remote sensing images are typically captured with a bird’s-eye view, the aspect ratios of objects in the same category may obey a Gaussian distribution. Generally, existing object detection methods ignore exploring the distribution character of aspect ratios for improving performance in remote sensing tasks. In this paper, we propose a novel Self-Adaptive Aspect Ratio Anchor (SARA) to explicitly explore aspect ratio variations of objects in remote sensing images. To be concrete, our SARA can self-adaptively learn an appropriate aspect ratio for each category. In this way, we can only utilize a simple squared anchor (related to the strides of feature maps in Feature Pyramid Networks) to regress objects in various aspect ratios. Finally, we adopt an Oriented Box Decoder (OBD) to align the feature maps and encode the orientation information of oriented objects. Our method achieves a promising mAP value of 79.91% on the DOTA dataset.

Journal Article

Share this book

Add to My Shelf

Weakly Correlated Knowledge Integration for Few-shot Image Classification

by Yin, Xu-Cheng , Yang, Chun , Liu, Chang in Accuracy , Classification , Correlation

2022

Various few-shot image classification methods indicate that transferring knowledge from other sources can improve the accuracy of the classification. However, most of these methods work with one single source or use only closely correlated knowledge sources. In this paper, we propose a novel weakly correlated knowledge integration (WCKI) framework to address these issues. More specifically, we propose a unified knowledge graph (UKG) to integrate knowledge transferred from different sources (i.e., visual domain and textual domain). Moreover, a graph attention module is proposed to sample the subgraph from the UKG with low complexity. To avoid explicitly aligning the visual features to the potentially biased and weakly correlated knowledge space, we sample a task-specific subgraph from UKG and append it as latent variables. Our framework demonstrates significant improvements on multiple few-shot image classification datasets.

Journal Article

Share this book

Add to My Shelf

Detecting Multi-Resolution Pedestrians Using Group Cost-Sensitive Boosting with Channel Features

by Zhu, Chao , Yin, Xu-Cheng in group cost-sensitive boosting , multi-resolution , pedestrian detection

2019

Significant progress has been achieved in the past few years for the challenging task of pedestrian detection. Nevertheless, a major bottleneck of existing state-of-the-art approaches lies in a great drop in performance with reducing resolutions of the detected targets. For the boosting-based detectors which are popular in pedestrian detection literature, a possible cause for this drop is that in their boosting training process, low-resolution samples, which are usually more difficult to be detected due to the missing details, are still treated equally importantly as high-resolution samples, resulting in the false negatives since they are more easily rejected in the early stages and can hardly be recovered in the late stages. To address this problem, we propose in this paper a robust multi-resolution detection approach with a novel group cost-sensitive boosting algorithm, which is derived from the standard AdaBoost algorithm to further explore different costs for different resolution groups of the samples in the boosting process, and to place greater emphasis on low-resolution groups in order to better handle the detection of multi-resolution targets. The effectiveness of the proposed approach is evaluated on the Caltech pedestrian benchmark and KAIST (Korea Advanced Institute of Science and Technology) multispectral pedestrian benchmark, and validated by its promising performance on different resolution-specific test sets of both benchmarks.

Journal Article

Share this book

Add to My Shelf

Biomedical literature classification with a CNNs-based hybrid learning network

by Li, Sujian , Yang, Chun , Yan, Yan in Analysis , Annotations , Artificial intelligence

2018

Deep learning techniques, e.g., Convolutional Neural Networks (CNNs), have been explosively applied to the research in the fields of information retrieval and natural language processing. However, few research efforts have addressed semantic indexing with deep learning. The use of semantic indexing in the biomedical literature has been limited for several reasons. For instance, MEDLINE citations contain a large number of semantic labels from automatically annotated MeSH terms, and for a great deal of the literature, only the information of the title and the abstract is readily available. In this paper, we propose a Boltzmann Convolutional neural network framework (B-CNN) for biomedicine semantic indexing. In our hybrid learning framework, the CNN can adaptively deal with features of documents that have sequence relationships, and can capture context information accordingly; the Deep Boltzmann Machine (DBM) merges global (the entity in each document) and local information through its training with undirected connections. Additionally, we have designed a hierarchical coarse to fine style indexing structure for learning and classifying documents, and a novel feature extension approach with word sequence embedding and Wikipedia categorization. Comparative experiments were conducted for semantic indexing of biomedical abstract documents; these experiments verified the encouraged performance of our B-CNN model.

Journal Article

Share this book

Add to My Shelf

ISART: A Generic Framework for Searching Books with Social Information

by Geng, Bin , Qu, Jiao , Hao, Hong-Wei in Algorithms , Analysis , Automation

2016

Effective book search has been discussed for decades and is still future-proof in areas as diverse as computer science, informatics, e-commerce and even culture and arts. A variety of social information contents (e.g, ratings, tags and reviews) emerge with the huge number of books on the Web, but how they are utilized for searching and finding books is seldom investigated. Here we develop an Integrated Search And Recommendation Technology (IsArt), which breaks new ground by providing a generic framework for searching books with rich social information. IsArt comprises a search engine to rank books with book contents and professional metadata, a Generalized Content-based Filtering model to thereafter rerank books with user-generated social contents, and a learning-to-rank technique to finally combine a wide range of diverse reranking results. Experiments show that this technology permits embedding social information to promote book search effectiveness, and IsArt, by making use of it, has the best performance on CLEF/INEX Social Book Search Evaluation datasets of all 4 years (from 2011 to 2014), compared with some other state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

OCRBench: on the hidden mystery of OCR in large multimodal models

by Huang, Mingxin , Liu, Cheng-Lin , Yu, Wenwen in Automation , Benchmarks , Computer Science

2024

Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering (VQA), document-oriented VQA, key information extraction (KIE), and handwritten mathematical expression recognition (HMER). To facilitate the assessment of optical character recognition (OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression recognition. Most importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal techniques. The evaluation pipeline and benchmark are available at https://github.com/Yuliang-Liu/MultimodalOCR .

Journal Article

Share this book

Add to My Shelf

DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

by Yang, Chun , Learned-Miller, Erik , Pei, Wei-Yi in Algorithms , Amino Acid Sequence , Annotations

2015

Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/.

Journal Article

Share this book

Add to My Shelf

A Denoising Based Autoassociative Model for Robust Sensor Monitoring in Nuclear Power Plants

by Ali, Hazrat , Shaheryar, Ahmad , Yin, Xu-Cheng in Heuristic , Learning , Monitoring

2016

Sensors health monitoring is essentially important for reliable functioning of safety-critical chemical and nuclear power plants. Autoassociative neural network (AANN) based empirical sensor models have widely been reported for sensor calibration monitoring. However, such ill-posed data driven models may result in poor generalization and robustness. To address above-mentioned issues, several regularization heuristics such as training with jitter, weight decay, and cross-validation are suggested in literature. Apart from these regularization heuristics, traditional error gradient based supervised learning algorithms for multilayered AANN models are highly susceptible of being trapped in local optimum. In order to address poor regularization and robust learning issues, here, we propose a denoised autoassociative sensor model (DAASM) based on deep learning framework. Proposed DAASM model comprises multiple hidden layers which are pretrained greedily in an unsupervised fashion under denoising autoencoder architecture. In order to improve robustness, dropout heuristic and domain specific data corruption processes are exercised during unsupervised pretraining phase. The proposed sensor model is trained and tested on sensor data from a PWR type nuclear power plant. Accuracy, autosensitivity, spillover, and sequential probability ratio test (SPRT) based fault detectability metrics are used for performance assessment and comparison with extensively reported five-layer AANN model by Kramer.

Journal Article

Share this book

Add to My Shelf

Selection of optimal denoising-based regularization hyper-parameters for performance improvement in a sensor validation model

by Shaheryar, Ahmad , Hong-Wei, Hao , Mahmood, Zahid in Active learning , Bias , Computer simulation

2018

Multilayered auto-associative neural architectures have widely been used in empirical sensor modeling. Typically, such empirical sensor models are used in sensor calibration and fault monitoring systems. However, simultaneous optimization of related performance metrics, i.e., auto-sensitivity, cross-sensitivity, and fault-detectability, is not a trivial task. Learning procedures for parametric and other relevant non-parametric empirical models are sensitive to optimization and regularization methods. Therefore, there is a need for active learning strategies that can better exploit the underlying statistical structure among input sensors and are simple to regularize and fine-tune. To this end, we investigated the greedy layer-wise learning strategy and denoising-based regularization procedure for sensor model optimization. We further explored the effects of denoising-based regularization hyper-parameters such as noise-type and noise-level on sensor model performance and suggested optimal settings through rigorous experimentation. A visualization procedure was introduced to obtain insight into the internal semantics of the learned model. These visualizations allowed us to suggest an implicit noise-generating process for efficient regularization in higher-order layers. We found that the greedy-learning procedure improved the overall robustness of the sensor model. To keep experimentation unbiased and immune to noise-related artifacts in real sensors, the sensor data were sampled from simulators of a nuclear steam supply system of a pressurized water reactor and a Tennessee Eastman chemical process. Finally, we compared the performance of an optimally regularized sensor model with auto-associative neural network, auto-associative kernel regression, and fuzzy similarity-based sensor models.

Journal Article

Share this book

Add to My Shelf

Learning Document Semantic Representation with Hybrid Deep Belief Network

by Li, Sujian , Yan, Yan , Yang, Mingyuan in Algorithms , Belief networks , Classification

2015

High-level abstraction, for example, semantic representation, is vital for document classification and retrieval. However, how to learn document semantic representation is still a topic open for discussion in information retrieval and natural language processing. In this paper, we propose a new Hybrid Deep Belief Network (HDBN) which uses Deep Boltzmann Machine (DBM) on the lower layers together with Deep Belief Network (DBN) on the upper layers. The advantage of DBM is that it employs undirected connection when training weight parameters which can be used to sample the states of nodes on each layer more successfully andit is also an effective way to remove noise from the different document representation type; the DBN can enhance extract abstract of the document in depth, making the model learn sufficient semantic representation. At the same time, we explore different input strategies for semantic distributed representation. Experimental results show that our model using the word embedding instead of single word has better performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter