Catalogue Search | MBRL

U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts

by Zottin, Silvia , Foresti, Gian Luca , De Nardin, Axel in Annotations , Artificial Intelligence , Computational Biology/Bioinformatics

2024

Document Layout Analysis, which is the task of identifying different semantic regions inside of a document page, is a subject of great interest for both computer scientists and humanities scholars as it represents a fundamental step towards further analysis tasks for the former and a powerful tool to improve and facilitate the study of the documents for the latter. However, many of the works currently present in the literature, especially when it comes to the available datasets, fail to meet the needs of both worlds and, in particular, tend to lean towards the needs and common practices of the computer science side, leading to resources that are not representative of the humanities real needs. For this reason, the present paper introduces U-DIADS-Bib, a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities. Furthermore, we propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation, necessary for the generation of the ground truth segmentation maps. Finally, we present a standardized few-shot version of the dataset (U-DIADS-BibFS), with the aim of encouraging the development of models and solutions able to address this task with as few samples as possible, which would allow for more effective use in a real-world scenario, where collecting a large number of segmentations is not always feasible.

Journal Article

Share this book

Add to My Shelf

HTR for Greek Historical Handwritten Documents

by Symeonidis, Symeon , Tsochatzidis, Lazaros , Papazoglou, Alexandros in Byzantine civilization , Computer architecture , convolutional neural networks

2021

Offline handwritten text recognition (HTR) for historical documents aims for effective transcription by addressing challenges that originate from the low quality of manuscripts under study as well as from several particularities which are related to the historical period of writing. In this paper, the challenge in HTR is related to a focused goal of the transcription of Greek historical manuscripts that contain several particularities. To this end, in this paper, a convolutional recurrent neural network architecture is proposed that comprises octave convolution and recurrent units which use effective gated mechanisms. The proposed architecture has been evaluated on three newly created collections from Greek historical handwritten documents that will be made publicly available for research purposes as well as on standard datasets like IAM and RIMES. For evaluation we perform a concise study which shows that compared to state of the art architectures, the proposed one deals effectively with the challenging Greek historical manuscripts.

Journal Article

Share this book

Add to My Shelf

Integrated Analysis Platform: An Open-Source Information System for High-Throughput Plant Phenotyping

by Pape, Jean-Michel , Klukas, Christian , Chen, Dijun in Breakthrough Technologies , Corn , Datasets

2014

High-throughput phenotyping is emerging as an important technology to dissect phenotypic components in plants. Efficient image processing and feature extraction are prerequisites to quantify plant growth and performance based on phenotypic traits. Issues include data management, image analysis, and result visualization of large-scale phenotypic data sets. Here, we present Integrated Analysis Platform (IAP), an open-source framework for high-throughput plant phenotyping. IAP provides user-friendly interfaces, and its core functions are highly adaptable. Our system supports image data transfer from different acquisition environments and large-scale image analysis for different plant species based on real-time imaging data obtained from different spectra. Due to the huge amount of data to manage, we utilized a common data structure for efficient storage and organization of data for both input data and result data. We implemented a block-based method for automated image processing to extract a representative list of plant phenotypic traits. We also provide tools for build-in data plotting and result export. For validation of IAP, we performed an example experiment that contains 33 maize (Zea mays 'Fernandez') plants, which were grown for 9 weeks in an automated greenhouse with nondestructive imaging. Subsequently, the image data were subjected to automated analysis with the maize pipeline implemented in our system. We found that the computed digital volume and number of leaves correlate with our manually measured data in high accuracy up to 0.98 and 0.95, respectively. In summary, IAP provides a multiple set of functionalities for import/export, management, and automated analysis of high-throughput plant phenotyping data, and its analysis results are highly reliable.

Journal Article

Share this book

Add to My Shelf

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

by Das, Nibaran , Roy, Kaushik , Obaidullah, Sk Md in Computer science , Datasets , Documents

2018

Without publicly available dataset, specifically in handwritten document recognition (HDR), we cannot make a fair and/or reliable comparison between the methods. Considering HDR, Indic script’s document recognition is still in its early stage compared to others such as Roman and Arabic. In this paper, we present a page-level handwritten document image dataset (PHDIndic_11), of 11 official Indic scripts: Bangla, Devanagari, Roman, Urdu, Oriya, Gurumukhi, Gujarati, Tamil, Telugu, Malayalam and Kannada. PHDIndic_11 is composed of 1458 document text-pages written by 463 individuals from various parts of India. Further, we report the benchmark results for handwritten script identification (HSI). Beside script identification, the dataset can be effectively used in many other applications of document image analysis such as script sentence recognition/understanding, text-line segmentation, word segmentation/recognition, word spotting, handwritten and machine printed texts separation and writer identification.

Journal Article

Share this book

Add to My Shelf

Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping

by Philipp, Patrick , Hertlein, Felix , Naumann, Alexander in Algorithms , Computer Science , Datasets

2023

Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d .

Journal Article

Share this book

Add to My Shelf

Dataset Generation for Gujarati Language Using Handwritten Character Images

by Suthar, Sanket B. , Thakkar, Amit R. in Accuracy , Artificial neural networks , Character recognition

2024

In pattern recognition, the handwritten character recognition (HCR) is considered as the classical challenge. In particular, the benchmark dataset for HCR in the Gujarati language is limited. To overcome this challenge, a proper dataset is required for experimentation. Hence, this work introduces dataset generation for the Gujarati language using pre-processing and classification techniques. Initially, the handwritten data is collected from various native Gujarati writers. In this work, there are three processes carried out to generate the dataset. Initially, the pre-processing stages like a selection of image, noise removal, normalization, conversion of integer value to double, grayscale image into a binary image, dimensionality reduction, and vector conversation are performed. Then, the pre-processed image is segmented using line segmentation, character segmentation and word segmentation. Finally, the data are classified using a Convolutional neural network (CNN). The kappa and FPR (False Positive Rate) values achieved by the CNN are 0.981 and 0.189.

Journal Article

Share this book

Add to My Shelf

Recommendation system based on deep learning methods: a systematic review and new directions

by Salim Naomie , Da’u Aminu in Artificial neural networks , Cold starts , Datasets

2020

These days, many recommender systems (RS) are utilized for solving information overload problem in areas such as e-commerce, entertainment, and social media. Although classical methods of RS have achieved remarkable successes in providing item recommendations, they still suffer from many issues such as cold start and data sparsity. With the recent achievements of deep learning in various applications such as Natural Language Processing (NLP) and image processing, more efforts have been made by the researchers to exploit deep learning methods for improving the performance of RS. However, despite the several research works on deep learning based RS, very few secondary studies were conducted in the field. Therefore, this study aims to provide a systematic literature review (SLR) of deep learning based RSs that can guide researchers and practitioners to better understand the new trends and challenges in the field. This paper is the first SLR specifically on the deep learning based RS to summarize and analyze the existing studies based on the best quality research publications. The paper particularly adopts an SLR approach based on the standard guidelines of the SLR designed by Kitchemen-ham which uses selection method and provides detail analysis of the research publications. Several publications were gathered and after inclusion/exclusion criteria and the quality assessment, the selected papers were finally used for the review. The results of the review indicated that autoencoder (AE) models are the most widely exploited deep learning architectures for RS followed by the Convolutional Neural Networks (CNNs) and the Recurrent Neural Networks (RNNs) models. Also, the results showed that Movie Lenses is the most popularly used datasets for the deep learning-based RS evaluation followed by the Amazon review datasets. Based on the results, the movie and e-commerce have been indicated as the most common domains for RS and that precision and Root Mean Squared Error are the most commonly used metrics for evaluating the performance of the deep leaning based RSs.

Journal Article

Share this book

Add to My Shelf

Deformity removal from handwritten text documents using variable cycle GAN

by Verma, Shekhar , Nagabhushan, P. , Nigam, Shivangi in Background noise , Computer Science , Datasets

2024

Text recognition systems typically work well for printed documents but struggle with handwritten documents due to different writing styles, background complexities, added noise of image acquisition methods, and deformed text images such as strike-offs and underlines. These deformities change the structural information, making it difficult to restore the deformed images while maintaining the structural information and preserving the semantic dependencies of the local pixels. Current adversarial networks are unable to preserve the structural and semantic dependencies as they focus on individual pixel-to-pixel variation and encourage non-meaningful aspects of the images. To address this, we propose a Variable Cycle Generative Adversarial Network ( VCGAN ) that considers the perceptual quality of the images. By using a variable Content Loss (Top- k Variable Loss ( T V k ) ), VCGAN preserves the inter-dependence of spatially close pixels while removing the strike-off strokes. The similarity of the images is computed with T V k considering the intensity variations that do not interfere with the semantic structures of the image. Our results show that VCGAN can remove most deformities with an elevated F 1 score of 97.40 % and outperforms current state-of-the-art algorithms with a character error rate of 7.64 % and word accuracy of 81.53 % when tested on the handwritten text recognition system

Journal Article

Share this book

Add to My Shelf

Analyzing the potential of active learning for document image classification

by Dengel, Andreas , Saifullah, Saifullah , Agne, Stefan in Annotations , Classification , Computer Science

2023

Deep learning has been extensively researched in the field of document analysis and has shown excellent performance across a wide range of document-related tasks. As a result, a great deal of emphasis is now being placed on its practical deployment and integration into modern industrial document processing pipelines. It is well known, however, that deep learning models are data-hungry and often require huge volumes of annotated data in order to achieve competitive performances. And since data annotation is a costly and labor-intensive process, it remains one of the major hurdles to their practical deployment. This study investigates the possibility of using active learning to reduce the costs of data annotation in the context of document image classification, which is one of the core components of modern document processing pipelines. The results of this study demonstrate that by utilizing active learning (AL), deep document classification models can achieve competitive performances to the models trained on fully annotated datasets and, in some cases, even surpass them by annotating only 15–40% of the total training dataset. Furthermore, this study demonstrates that modern AL strategies significantly outperform random querying, and in many cases achieve comparable performance to the models trained on fully annotated datasets even in the presence of practical deployment issues such as data imbalance, and annotation noise, and thus, offer tremendous benefits in real-world deployment of deep document classification models. The code to reproduce our experiments is publicly available at https://github.com/saifullah3396/doc_al .

Journal Article

Share this book

Add to My Shelf

DCT-CompSegNet: fast layout segmentation in DCT compressed JPEG document images using deep feature learning

by Lin, Meng , Javed, Mohammed , Rajesh, Bulla in Computer Communication Networks , Computer Science , Data Structures and Information Theory

2024

The problem of layout segmentation is still very challenging in document images like newspapers, magazines, and research articles, that have both text and non-text components arranged in an artistic way to attract various types of readers. Traditionally, layout segmentation has been carried out in the pixel domain, with an assumption that images are always available in the uncompressed pixel form. However, in reality, the images are acquired and rendered in the compressed form, and therefore the traditional techniques require additional stage of decompression to get back the images in the pixel form for further processing. Therefore, in this research paper, the idea of direct layout segmentation in compressed document images is proposed, to bypass the decompression stage and at the same time provide good performance with reduced computation time. This paper proposes to explore a novel deep learning architecture called as DCT-CompSegNet, that learns features straight from the DCT compressed streams of JPEG documents to accomplish layout segmentation directly in the JPEG compressed domain. Unlike the existing layout segmentation methods that work in pixel domain, the novelty here is that a compressed stream of DCT coefficients extracted from the JPEG documents is used to train the deep learning network. The feature learning in the model is so efficient that it is capable of accomplishing layout segmentation in both printed as well as handwritten document images with state-of-the-art performance. Experiments have been carried out using two benchmark datasets - Publay and Prima consisting of complex machine-printed document images, and the robustness of the model is also demonstrated with the self-created Handwritten dataset.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter