Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,650
result(s) for
"OCR"
Sort by:
Arabic Optical Character Recognition: A Review
This study aims to review the latest contributions in Arabic Optical Character Recognition (OCR) during the last decade, which helps interested researchers know the existing techniques and extend or adapt them accordingly. The study describes the characteristics of the Arabic language, different types of OCR systems, different stages of the Arabic OCR system, the researcher’s contributions in each step, and the evaluation metrics for OCR. The study reviews the existing datasets for the Arabic OCR and their characteristics. Additionally, this study implemented some preprocessing and segmentation stages of Arabic OCR. The study compares the performance of the existing methods in terms of recognition accuracy. In addition to researchers’ OCR methods, commercial and open-source systems are used in the comparison. The Arabic language is morphologically rich and written cursive with dots and diacritics above and under the characters. Most of the existing approaches in the literature were evaluated on isolated characters or isolated words under a controlled environment, and few approaches were tested on page-level scripts. Some comparative studies show that the accuracy of the existing Arabic OCR commercial systems is low, under 75% for printed text, and further improvement is needed. Moreover, most of the current approaches are offline OCR systems, and there is no remarkable contribution to online OCR systems.
Journal Article
DR. VLADIMIR GLOBOČNIK PLEMENITI SORODOLSKI
by
Triglav, Joc
2017
OCR - optical character recognition) izvirnega prispevka, ki je bil objavljen leta 1917 v Avstrijskem časopisu za geodezijo. Ze kot študent na univerzi pa za bolj celovito informacijo še kratek pregled zgodovine delovanja cesarsko-kraljeve generalne direkcije zemljiško-davčnega katastra, urejeno povzet po opisu arhivskega fonda SI AS 1102 (http://arsq.gov.si/query/detail.aspx?ID=25397): S cesarskim ukazom 21. Vermessungs-department), ki se je delil na dva oddelka: I. za trigonometrično regulacijo in litografski inštitut (nem. Od 1865 je spadal zemljiško davčni kataster pod Sekcijo za upravno službo (nem. C. kr. generalna direkcija zemljiško-davčnega katastra, 1819-1914 (Fond) http://arsq.gov.si/query/detail.aspx?ID=25397 Dr. Joc Triglav, univ. dipl. inž. geod.
Journal Article
A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR
by
Hossain, B. M. Mainul
,
Rohan, Shadman
,
Hossain, Md Sazzad
in
Computer Science
,
Datasets
,
Deep learning
2024
Bangla Optical Character Recognition (OCR) poses a unique challenge due to the presence of hundreds of diverse conjunct characters formed by the combination of two or more letters. In this paper, we propose two novel grapheme representation methods that improve the recognition of these conjunct characters and the overall performance of OCR in Bangla. We have utilized the popular Convolutional Recurrent Neural Network architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Experimental results proved the effectiveness of our novel approach. Furthermore, our synthetically generated training dataset and the test protocols are made available to serve as benchmarks for future Bangla OCR research.
Journal Article
Improved Handwritten Digit Recognition Using Convolutional Neural Networks (CNN)
by
Nayyar, Anand
,
Singh, Saurabh
,
Yoon, Byungun
in
Accuracy
,
Classification
,
convolutional neural networks
2020
Traditional systems of handwriting recognition have relied on handcrafted features and a large amount of prior knowledge. Training an Optical character recognition (OCR) system based on these prerequisites is a challenging task. Research in the handwriting recognition field is focused around deep learning techniques and has achieved breakthrough performance in the last few years. Still, the rapid growth in the amount of handwritten data and the availability of massive processing power demands improvement in recognition accuracy and deserves further investigation. Convolutional neural networks (CNNs) are very effective in perceiving the structure of handwritten characters/words in ways that help in automatic extraction of distinct features and make CNN the most suitable approach for solving handwriting recognition problems. Our aim in the proposed work is to explore the various design options like number of layers, stride size, receptive field, kernel size, padding and dilution for CNN-based handwritten digit recognition. In addition, we aim to evaluate various SGD optimization algorithms in improving the performance of handwritten digit recognition. A network’s recognition accuracy increases by incorporating ensemble architecture. Here, our objective is to achieve comparable accuracy by using a pure CNN architecture without ensemble architecture, as ensemble architectures introduce increased computational cost and high testing complexity. Thus, a CNN architecture is proposed in order to achieve accuracy even better than that of ensemble architectures, along with reduced operational complexity and cost. Moreover, we also present an appropriate combination of learning parameters in designing a CNN that leads us to reach a new absolute record in classifying MNIST handwritten digits. We carried out extensive experiments and achieved a recognition accuracy of 99.87% for a MNIST dataset.
Journal Article
Modular Pipeline for Text Recognition in Early Printed Books Using Kraken and ByT5
2025
Early printed books, particularly incunabula, are invaluable archives of the beginnings of modern educational systems. However, their complex layouts, antique typefaces, and page degradation caused by bleed-through and ink fading pose significant challenges for automatic transcription. In this work, we present a modular pipeline that addresses these problems by combining modern layout analysis and language modeling techniques. The pipeline begins with historical layout-aware text segmentation using Kraken, a neural network-based tool tailored for early typographic structures. Initial optical character recognition (OCR) is then performed with Kraken’s recognition engine, followed by post-correction using a fine-tuned ByT5 transformer model trained on manually aligned line-level data. By learning to map noisy OCR outputs to verified transcriptions, the model substantially improves recognition quality. The pipeline also integrates a preprocessing stage based on our previous work on bleed-through removal using robust statistical filters, including non-local means, Gaussian mixtures, biweight estimation, and Gaussian blur. This step enhances the legibility of degraded pages prior to OCR. The entire solution is open, modular, and scalable, supporting long-term preservation and improved accessibility of cultural heritage materials. Experimental results on 15th-century incunabula show a reduction in the Character Error Rate (CER) from around 38% to around 15% and an increase in the Bilingual Evaluation Understudy (BLEU) score from 22 to 44, confirming the effectiveness of our approach. This work demonstrates the potential of integrating transformer-based correction with layout-aware segmentation to enhance OCR accuracy in digital humanities applications.
Journal Article
Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction
by
Najam, Rayyan
,
Faizullah, Safiullah
in
Analysis
,
Arabic handwritten text recognition
,
Arabic OCR
2023
Arabic handwritten-text recognition applies an OCR technique and then a text-correction technique to extract the text within an image correctly. Deep learning is a current paradigm utilized in OCR techniques. However, no study investigated or critically analyzed recent deep-learning techniques used for Arabic handwritten OCR and text correction during the period of 2020–2023. This analysis fills this noticeable gap in the literature, uncovering recent developments and their limitations for researchers, practitioners, and interested readers. The results reveal that CNN-LSTM-CTC is the most suitable architecture among Transformer and GANs for OCR because it is less complex and can hold long textual dependencies. For OCR text correction, applying DL models to generated errors in datasets improved accuracy in many works. In conclusion, Arabic OCR has the potential to further apply several text-embedding models to correct the resultant text from the OCR, and there is a significant gap in studies investigating this problem. In addition, there is a need for more high-quality and domain-specific OCR Arabic handwritten datasets. Moreover, we recommend the practical development of a space for future trends in Arabic OCR applications, derived from current limitations in Arabic OCR works and from applications in other languages; this will involve a plethora of possibilities that have not been effectively researched at the time of writing.
Journal Article
Enhancing automated vehicle identification by integrating YOLO v8 and OCR techniques for high-precision license plate detection and recognition
by
Akkad, Nabil El
,
Baihan, Abdullah
,
Rathore, Rajkumar Singh
in
639/705/117
,
639/705/258
,
639/705/794
2024
Vehicle identification systems are vital components that enable many aspects of contemporary life, such as safety, trade, transit, and law enforcement. They improve community and individual well-being by increasing vehicle management, security, and transparency. These tasks entail locating and extracting license plates from images or video frames using computer vision and machine learning techniques, followed by recognizing the letters or digits on the plates. This paper proposes a new license plate detection and recognition method based on the deep learning YOLO v8 method, image processing techniques, and the OCR technique for text recognition. For this, the first step was the dataset creation, when gathering 270 images from the internet. Afterward, CVAT (Computer Vision Annotation Tool) was used to annotate the dataset, which is an open-source software platform made to make computer vision tasks easier to annotate and label images and videos. Subsequently, the newly released Yolo version, the Yolo v8, has been employed to detect the number plate area in the input image. Subsequently, after extracting the plate the k-means clustering algorithm, the thresholding techniques, and the opening morphological operation were used to enhance the image and make the characters in the license plate clearer before using OCR. The next step in this process is using the OCR technique to extract the characters. Eventually, a text file containing only the character reflecting the vehicle's country is generated. To ameliorate the efficiency of the proposed approach, several metrics were employed, namely precision, recall, F1-Score, and CLA. In addition, a comparison of the proposed method with existing techniques in the literature has been given. The suggested method obtained convincing results in both detection as well as recognition by obtaining an accuracy of 99% in detection and 98% in character recognition.
Journal Article
OCRBench: on the hidden mystery of OCR in large multimodal models
2024
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering (VQA), document-oriented VQA, key information extraction (KIE), and handwritten mathematical expression recognition (HMER). To facilitate the assessment of optical character recognition (OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression recognition. Most importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal techniques. The evaluation pipeline and benchmark are available at
https://github.com/Yuliang-Liu/MultimodalOCR
.
Journal Article
RETRACTED: Design and Implementation of Acquire Carriage for Disabled people in a Visual Surveillance Using Character Recognition
2022
In certain cases, persons with disabilities may be forced to rely on others for the performance of their duties. Blindness is one of the impairments that might be encountered. Up to this point, there has been N number of solutions presented that make life easier for visually impaired individuals. One of the problems they face on a daily basis is making an independent purchase of a product they need. To solve this issue, the approach is to utilize a camera to record a picture, which is then processed using the tesseract method to extract text from the image, which is then transformed into an audio file that can be heard using headphones. Following the implementation of this strategy, during this shopping trolley technology to detect the item put with machine learning and precision location discover a person will be used to locate a person in the shopping trolley.
Journal Article
Retrokonversionsprojekt von 35.000 Zettelkarten der Musikbibliothek der Leipziger Städtischen Bibliotheken
by
Wallwitz, Sebastian
,
Aring, Annalena
,
Müller, Jane
in
external data transfer
,
Fremddatenimport
,
Historic library stock
2022
Im Rahmen der Erfassung des Kunst- und Kulturguts der Stadt Leipzig wurden im Jahr 2021 insgesamt 35.000 Zettelkarten der Leipziger Städtischen Bibliotheken digitalisiert und im Metadatenformat MARC-XML in das Bibliotheksmanagementsystem importiert. Das Retrokonversionsprojekt startete mit einer umfangreichen Testphase und anschließender effizienter Projektarbeit, um das Ziel zu erreichen, die musikspezifischen historischen Bestände der Bibliothek online zu verzeichnen. In dem Projekt wirkten die Sachgebiete Musikbibliothek/Sondersammlungen, Katalogisierung sowie Digitales und IT zusammen mit Dienstleistern aus Leipzig und Vietnam.
Journal Article