Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
93
result(s) for
"Doermann, David"
Sort by:
Text and non-text separation in offline document images: a survey
by
Bhowmik, Showmik
,
Doermann, David
,
Sarkar, Ram
in
Data processing
,
Documents
,
Engineering drawings
2018
Separation of text and non-text is an essential processing step for any document analysis system. Therefore, it is important to have a clear understanding of the state-of-the-art of text/non-text separation in order to facilitate the development of efficient document processing systems. This paper first summarizes the technical challenges of performing text/non-text separation. It then categorizes offline document images into different classes according to the nature of the challenges one faces, in an attempt to provide insight into various techniques presented in the literature. The pros and cons of various techniques are explained wherever possible. Along with the evaluation protocols, benchmark databases, this paper also presents a performance comparison of different methods. Finally, this article highlights the future research challenges and directions in this domain.
Journal Article
Machine-assisted authentication of paper currency: an experiment on Indian banknotes
2015
Automatic authentication of paper money is becoming an increasingly urgent problem because of new and improved uses of counterfeits. In this paper, we describe a system developed for discriminating fake notes from genuine ones and apply it to Indian banknotes. Image processing and pattern recognition techniques are used to design the overall approach. The ability of the embedded security aspects is thoroughly analysed for detecting fake currencies. Real samples are used in the experiments that show a high-precision machine can be developed for authentication of paper money. The system performance is reported for both accuracy and processing speed. The analysis of security features to prevent counterfeiting highlights some of the issues that should be considered in designing of currency notes in the future.
Journal Article
Future of software development with generative AI
by
Riekki, Jukka
,
Doermann, David
,
Sauvola, Jaakko
in
Artificial Intelligence
,
Computer Science
,
Generative artificial intelligence
2024
Generative AI is regarded as a major disruption to software development. Platforms, repositories, clouds, and the automation of tools and processes have been proven to improve productivity, cost, and quality. Generative AI, with its rapidly expanding capabilities, is a major step forward in this field. As a new key enabling technology, it can be used for many purposes, from creative dimensions to replacing repetitive and manual tasks. The number of opportunities increases with the capabilities of large-language models (LLMs). This has raised concerns about ethics, education, regulation, intellectual property, and even criminal activities. We analyzed the potential of generative AI and LLM technologies for future software development paths. We propose four primary scenarios, model trajectories for transitions between them, and reflect against relevant software development operations. The motivation for this research is clear: the software development industry needs new tools to understand the potential, limitations, and risks of generative AI, as well as guidelines for using it.
Journal Article
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning
by
Guo, Guodong
,
Wang, Runqi
,
Doermann, David
in
Artificial neural networks
,
Feature extraction
,
Learning
2023
We present a flexible, general framework for few-shot learning where both inter-class differences and intra-class relationships are fully considered to improve recognition performance significantly. We introduce complex-valued convolutional neural networks (CNNs) to describe the subtle difference among inter-class samples and Dependable Learning to capture the intra-class relationship. Conventional CNNs use only real-valued CNNs and fail to extract more detailed information. Complex-valued CNNs, on the other hand, can provide amplitude and phase information to enhance the feature representation ability based on the proposed complex metric module (CMM). Building upon the recent episodic training mechanism, CMMs can improve the representation capacity by extracting robust complex-valued features to facilitate the modeling of subtle relationships among few-shot samples. Furthermore, we use Dependable Learning as a new learning paradigm, to promote a robust model against perturbation based on a new bilinear optimization to enhance the feature extraction capacity for very few available intra-class samples. Experiments on two benchmark datasets show that the proposed methods significantly improve the performance over other approaches and achieve state-of-the-art results.
Journal Article
Anti-Bandit for Neural Architecture Search
by
Wang, Runqi
,
Doermann, David
,
Chen, Hanlin
in
Artificial intelligence
,
Computer vision
,
Gabor filters
2023
Neural Architecture Search (NAS) is a highly challenging task that requires consideration of search space, search efficiency, and adversarial robustness of the network. In this paper, to accelerate the training speed, we reformulate NAS as a multi-armed bandit problem and present Anti-Bandit NAS (ABanditNAS) method, which exploits Upper Confidence Bounds (UCB) to abandon arms for search efficiency and Lower Confidence Bounds (LCB) for fair competition between arms. Based on the presented ABanditNAS, the adversarially robust optimization and architecture search can be solved in a unified framework. Specifically, our proposed framework defends against adversarial attacks based on a comprehensive search of denoising blocks, weight-free operations, Gabor filters, and convolutions. The theoretical analysis on the rationality of the two confidence bounds in ABanditNAS are provided and extensive experiments on three benchmarks are conducted. The results demonstrate that the presented ABanditNAS achieves competitive accuracy at a reduced search cost compared to prior methods.
Journal Article
Camera-based analysis of text and documents: a survey
2005
The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.
Journal Article
Towards Compact 1-bit CNNs via Bayesian Learning
by
Doermann, David
,
Gu Jiaxin
,
Guo Guodong
in
Algorithms
,
Artificial neural networks
,
Back propagation
2022
Deep convolutional neural networks (DCNNs) have dominated as the best performers on almost all computer vision tasks over the past several years. However, it remains a major challenge to deploy these powerful DCNNs in resource-limited environments, such as embedded devices and smartphones. To this end, 1-bit CNNs have emerged as a feasible solution as they are much more resource-efficient. Unfortunately, they often suffer from a significant performance drop compared to their full-precision counterparts. In this paper, we propose a novel Bayesian Optimized compact 1-bit CNNs (BONNs) model, which has the advantage of Bayesian learning, to improve the performance of 1-bit CNNs significantly. BONNs incorporate the prior distributions of full-precision kernels, features, and filters into a Bayesian framework to construct 1-bit CNNs in a comprehensive end-to-end manner. The proposed Bayesian learning algorithms are well-founded and used to optimize the network simultaneously in different kernels, features, and filters, which largely improves the compactness and capacity of 1-bit CNNs. We further introduce a new Bayesian learning-based pruning method for 1-bit CNNs, which significantly increases the model efficiency with very competitive performance. This enables our method to be used in a variety of practical scenarios. Extensive experiments on the ImageNet, CIFAR, and LFW datasets show that BONNs achieve the best in classification performance compared to a variety of state-of-the-art 1-bit CNN models. In particular, BONN achieves a strong generalization performance on the object detection task.
Journal Article
Scene text recognition: an Indic perspective
by
Doermann, David
,
Chanda, Sukalpa
,
Vijayan, Vasanthan P.
in
Accuracy
,
Classification
,
Computer Science
2025
Exploring Scene Text Recognition (STR) in Indian languages is an important research domain due to its wide applications. This paper proposes a spatial attention-based model (LaSA-Net) that combines visual features and language knowledge for word recognition from scene image word segments. We augment the classical cross-entropy loss with a novel language-attunement loss that enables the model to learn valid and prevalent character sequences in the word. This enhances the model’s ability to perform zero-shot word recognition. Further, to compensate for the lack of rotational invariance in CNN based feature extraction backbone, we propose a training data augmentation strategy involving the creation of glyphs: images of individual characters of different orientations. This improves LaSA-Net’s ability to recognize words in images with curved/vertically aligned text, alleviating the need for computationally expensive preprocessing modules. Our experiments with Tamil, Malayalam, and Telugu scripts on the IIIT-ILST datasets have achieved new benchmark results and outperformed other state-of-the-art STR models.
Journal Article
Rectified Binary Convolutional Networks with Generative Adversarial Learning
by
Liu Jianzhuang
,
Doermann, David
,
Zhang Baochang
in
Artificial neural networks
,
Electronic devices
,
Face recognition
2021
Binarized convolutional neural networks (BNNs) are widely used to improve the memory and computational efficiency of deep convolutional neural networks for to be employed on embedded devices. However, existing BNNs fail to explore their corresponding full-precision models’ potential, resulting in a significant performance gap. This paper introduces a Rectified Binary Convolutional Network (RBCN) by combining full precision kernels and feature maps to rectify the binarization process in a generative adversarial network (GAN) framework. We further prune our RBCNs using the GAN framework to increase the model efficiency and promote flexibly in practical applications. Extensive experiments validate the superior performance of the proposed RBCN over state-of-the-art BNNs on tasks such as object classification, object tracking, face recognition, and person re-identification.
Journal Article
Long term 5G network traffic forecasting via modeling non-stationarity with deep learning
2023
5G cellular networks have recently fostered a wide range of emerging applications, but their popularity has led to traffic growth that far outpaces network expansion. This mismatch may decrease network quality and cause severe performance problems. To reduce the risk, operators need long term traffic prediction to perform network expansion schemes months ahead. However, long term prediction horizon exposes the non-stationarity of series data, which deteriorates the performance of existing approaches. We deal with this problem by developing a deep learning model, Diviner, that incorporates stationary processes into a well-designed hierarchical structure and models non-stationary time series with multi-scale stable features. We demonstrate substantial performance improvement of Diviner over the current state of the art in 5G network traffic forecasting with detailed months-level forecasting for massive ports with complex flow patterns. Extensive experiments further present its applicability to various predictive scenarios without any modification, showing potential to address broader engineering problems.
5G network operators need data traffic predictions to plan network expansion schemes. Yuguang Yang and colleagues demonstrate performance improvement over state-of-the-art forecasting tools of a deep learning model, Diviner. They demonstrate detailed months-level forecasting for massive ports with complex flow patterns.
Journal Article