Catalogue Search | MBRL

Text and non-text separation in offline document images: a survey

by Bhowmik, Showmik , Doermann, David , Sarkar, Ram in Data processing , Documents , Engineering drawings

2018

Separation of text and non-text is an essential processing step for any document analysis system. Therefore, it is important to have a clear understanding of the state-of-the-art of text/non-text separation in order to facilitate the development of efficient document processing systems. This paper first summarizes the technical challenges of performing text/non-text separation. It then categorizes offline document images into different classes according to the nature of the challenges one faces, in an attempt to provide insight into various techniques presented in the literature. The pros and cons of various techniques are explained wherever possible. Along with the evaluation protocols, benchmark databases, this paper also presents a performance comparison of different methods. Finally, this article highlights the future research challenges and directions in this domain.

Journal Article

Share this book

Add to My Shelf

Machine-assisted authentication of paper currency: an experiment on Indian banknotes

by Doermann, David S , Halder, Biswajit , Roy, Ankush in Algorithms , Automation , Banknotes

2015

Automatic authentication of paper money is becoming an increasingly urgent problem because of new and improved uses of counterfeits. In this paper, we describe a system developed for discriminating fake notes from genuine ones and apply it to Indian banknotes. Image processing and pattern recognition techniques are used to design the overall approach. The ability of the embedded security aspects is thoroughly analysed for detecting fake currencies. Real samples are used in the experiments that show a high-precision machine can be developed for authentication of paper money. The system performance is reported for both accuracy and processing speed. The analysis of security features to prevent counterfeiting highlights some of the issues that should be considered in designing of currency notes in the future.

Journal Article

Share this book

Add to My Shelf

Future of software development with generative AI

by Riekki, Jukka , Doermann, David , Sauvola, Jaakko in Artificial Intelligence , Computer Science , Generative artificial intelligence

2024

Generative AI is regarded as a major disruption to software development. Platforms, repositories, clouds, and the automation of tools and processes have been proven to improve productivity, cost, and quality. Generative AI, with its rapidly expanding capabilities, is a major step forward in this field. As a new key enabling technology, it can be used for many purposes, from creative dimensions to replacing repetitive and manual tasks. The number of opportunities increases with the capabilities of large-language models (LLMs). This has raised concerns about ethics, education, regulation, intellectual property, and even criminal activities. We analyzed the potential of generative AI and LLM technologies for future software development paths. We propose four primary scenarios, model trajectories for transitions between them, and reflect against relevant software development operations. The motivation for this research is clear: the software development industry needs new tools to understand the potential, limitations, and risks of generative AI, as well as guidelines for using it.

Journal Article

Share this book

Add to My Shelf

Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning

by Guo, Guodong , Wang, Runqi , Doermann, David in Artificial neural networks , Feature extraction , Learning

2023

We present a flexible, general framework for few-shot learning where both inter-class differences and intra-class relationships are fully considered to improve recognition performance significantly. We introduce complex-valued convolutional neural networks (CNNs) to describe the subtle difference among inter-class samples and Dependable Learning to capture the intra-class relationship. Conventional CNNs use only real-valued CNNs and fail to extract more detailed information. Complex-valued CNNs, on the other hand, can provide amplitude and phase information to enhance the feature representation ability based on the proposed complex metric module (CMM). Building upon the recent episodic training mechanism, CMMs can improve the representation capacity by extracting robust complex-valued features to facilitate the modeling of subtle relationships among few-shot samples. Furthermore, we use Dependable Learning as a new learning paradigm, to promote a robust model against perturbation based on a new bilinear optimization to enhance the feature extraction capacity for very few available intra-class samples. Experiments on two benchmark datasets show that the proposed methods significantly improve the performance over other approaches and achieve state-of-the-art results.

Journal Article

Share this book

Add to My Shelf

Anti-Bandit for Neural Architecture Search

by Wang, Runqi , Doermann, David , Chen, Hanlin in Artificial intelligence , Computer vision , Gabor filters

2023

Neural Architecture Search (NAS) is a highly challenging task that requires consideration of search space, search efficiency, and adversarial robustness of the network. In this paper, to accelerate the training speed, we reformulate NAS as a multi-armed bandit problem and present Anti-Bandit NAS (ABanditNAS) method, which exploits Upper Confidence Bounds (UCB) to abandon arms for search efficiency and Lower Confidence Bounds (LCB) for fair competition between arms. Based on the presented ABanditNAS, the adversarially robust optimization and architecture search can be solved in a unified framework. Specifically, our proposed framework defends against adversarial attacks based on a comprehensive search of denoising blocks, weight-free operations, Gabor filters, and convolutions. The theoretical analysis on the rationality of the two confidence bounds in ABanditNAS are provided and extensive experiments on three benchmarks are conducted. The results demonstrate that the presented ABanditNAS achieves competitive accuracy at a reduced search cost compared to prior methods.

Journal Article

Share this book

Add to My Shelf

Camera-based analysis of text and documents: a survey

by Liang, Jian , Li, Huiping , Doermann, David in Availability , Devices , Digital cameras

2005

The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.

Journal Article

Share this book

Add to My Shelf

Towards Compact 1-bit CNNs via Bayesian Learning

by Doermann, David , Gu Jiaxin , Guo Guodong in Algorithms , Artificial neural networks , Back propagation

2022

Deep convolutional neural networks (DCNNs) have dominated as the best performers on almost all computer vision tasks over the past several years. However, it remains a major challenge to deploy these powerful DCNNs in resource-limited environments, such as embedded devices and smartphones. To this end, 1-bit CNNs have emerged as a feasible solution as they are much more resource-efficient. Unfortunately, they often suffer from a significant performance drop compared to their full-precision counterparts. In this paper, we propose a novel Bayesian Optimized compact 1-bit CNNs (BONNs) model, which has the advantage of Bayesian learning, to improve the performance of 1-bit CNNs significantly. BONNs incorporate the prior distributions of full-precision kernels, features, and filters into a Bayesian framework to construct 1-bit CNNs in a comprehensive end-to-end manner. The proposed Bayesian learning algorithms are well-founded and used to optimize the network simultaneously in different kernels, features, and filters, which largely improves the compactness and capacity of 1-bit CNNs. We further introduce a new Bayesian learning-based pruning method for 1-bit CNNs, which significantly increases the model efficiency with very competitive performance. This enables our method to be used in a variety of practical scenarios. Extensive experiments on the ImageNet, CIFAR, and LFW datasets show that BONNs achieve the best in classification performance compared to a variety of state-of-the-art 1-bit CNN models. In particular, BONN achieves a strong generalization performance on the object detection task.

Journal Article

Share this book

Add to My Shelf

Scene text recognition: an Indic perspective

by Doermann, David , Chanda, Sukalpa , Vijayan, Vasanthan P. in Accuracy , Classification , Computer Science

2025

Exploring Scene Text Recognition (STR) in Indian languages is an important research domain due to its wide applications. This paper proposes a spatial attention-based model (LaSA-Net) that combines visual features and language knowledge for word recognition from scene image word segments. We augment the classical cross-entropy loss with a novel language-attunement loss that enables the model to learn valid and prevalent character sequences in the word. This enhances the model’s ability to perform zero-shot word recognition. Further, to compensate for the lack of rotational invariance in CNN based feature extraction backbone, we propose a training data augmentation strategy involving the creation of glyphs: images of individual characters of different orientations. This improves LaSA-Net’s ability to recognize words in images with curved/vertically aligned text, alleviating the need for computationally expensive preprocessing modules. Our experiments with Tamil, Malayalam, and Telugu scripts on the IIIT-ILST datasets have achieved new benchmark results and outperformed other state-of-the-art STR models.

Journal Article

Share this book

Add to My Shelf

Rectified Binary Convolutional Networks with Generative Adversarial Learning

by Liu Jianzhuang , Doermann, David , Zhang Baochang in Artificial neural networks , Electronic devices , Face recognition

2021

Binarized convolutional neural networks (BNNs) are widely used to improve the memory and computational efficiency of deep convolutional neural networks for to be employed on embedded devices. However, existing BNNs fail to explore their corresponding full-precision models’ potential, resulting in a significant performance gap. This paper introduces a Rectified Binary Convolutional Network (RBCN) by combining full precision kernels and feature maps to rectify the binarization process in a generative adversarial network (GAN) framework. We further prune our RBCNs using the GAN framework to increase the model efficiency and promote flexibly in practical applications. Extensive experiments validate the superior performance of the proposed RBCN over state-of-the-art BNNs on tasks such as object classification, object tracking, face recognition, and person re-identification.

Journal Article

Share this book

Add to My Shelf

Long term 5G network traffic forecasting via modeling non-stationarity with deep learning

by Yang, Yuguang , Zhang, Juan , Doermann, David in 5G mobile communication , 706/648 , 706/703

2023

5G cellular networks have recently fostered a wide range of emerging applications, but their popularity has led to traffic growth that far outpaces network expansion. This mismatch may decrease network quality and cause severe performance problems. To reduce the risk, operators need long term traffic prediction to perform network expansion schemes months ahead. However, long term prediction horizon exposes the non-stationarity of series data, which deteriorates the performance of existing approaches. We deal with this problem by developing a deep learning model, Diviner, that incorporates stationary processes into a well-designed hierarchical structure and models non-stationary time series with multi-scale stable features. We demonstrate substantial performance improvement of Diviner over the current state of the art in 5G network traffic forecasting with detailed months-level forecasting for massive ports with complex flow patterns. Extensive experiments further present its applicability to various predictive scenarios without any modification, showing potential to address broader engineering problems. 5G network operators need data traffic predictions to plan network expansion schemes. Yuguang Yang and colleagues demonstrate performance improvement over state-of-the-art forecasting tools of a deep learning model, Diviner. They demonstrate detailed months-level forecasting for massive ports with complex flow patterns.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter