Catalogue Search | MBRL

DRA-Net: Dynamic Feature Fusion Upsampling and Text-Region Focus for Ancient Chinese Scene Text Detection

by Qi, Hengnian , Wang, Yihang , Xin, Qiuyi in Accuracy , Algorithms , Aspect ratio

2025

Ancient Chinese scene text detection, as an emerging interdisciplinary topic between computer vision and cultural heritage preservation, presents unique technical challenges. Compared with modern scene text, ancient Chinese text is characterized by complex backgrounds, diverse fonts, extreme aspect ratios, and a scarcity of annotated data. Existing detection methods often perform poorly under these conditions. To address these challenges, this paper proposes a novel detection network based on dynamic feature fusion upsampling and text-region focus, named DRA-Net. The core innovations of the proposed method include (1) a dynamic fusion upsampling module, which adaptively assigns weights to effectively fuse multi-scale features while preserving critical information during feature propagation; (2) an adaptive text-region focus module that incorporates axial attention mechanisms to enhance the model’s ability to locate text regions and suppress background interference; and (3) the integration of deformable convolution, which improves the network’s capacity to model irregular text shapes and extreme aspect ratios. To tackle the issue of data scarcity, we construct a dataset named ACST, specifically for ancient Chinese text detection. This dataset includes a wide range of scene types, such as stone inscriptions, calligraphy works, couplets, and other historical media, covering various font styles from different historical periods, thus offering strong data support for related research. Experimental results demonstrate that DRA-Net achieves significantly higher detection accuracy on the ACST dataset compared to existing methods and performs robustly in scenarios with complex backgrounds and extreme text aspect ratios. It achieves an F1-score of 72.9%, a precision of 82.8%, and a recall of 77.5%. This study provides an effective technical solution for the digitization of ancient documents and the intelligent preservation of cultural heritage, with strong theoretical significance and practical potential.

Journal Article

Share this book

Add to My Shelf

Fast Arbitrary Shaped Scene Text Detection via Text Discriminator

by Zeng, Chengbin , Song, Chunli in PSENet , real-time detection speed , Scene text detection

2021

Robust scene text detection is one of the difficult and significant challenges in the computer vision community. Most previous methods detect arbitrary-shaped text using complicated post-processing steps. In this paper, we propose a trainable fast arbitrary-shaped text detection network by using the text discriminator, sharing visual information among the two complementary tasks. Specifically, we extend PSENet [1] by adding a text discriminator to fuse multiple predictions for each text instance, rather than using complicated post-processing steps which are time consuming. The text discriminator shares visual information with text detection network, and thus can achieve much faster detection speed compared with PSENet, while maintaining a similar accuracy reported in PSENet. Furthermore, our text discriminator can reduce the false alarms effectively. Experiments on ICDAR 2017 MLT, ICDAR 2015, and ICDAR 2019 ART datasets demonstrate that the proposed approach can achieve nearly realtime detection speed while keeping state-of-the-art detection accuracy.

Journal Article

Share this book

Add to My Shelf

R-YOLO: A Real-Time Text Detector for Natural Scenes with Arbitrary Rotation

by Wang, Xiqi , Zhang, Ce , Gui, Li in arbitrarily-oriented text , convolutional neural network , rotation anchor

2021

Accurate and efficient text detection in natural scenes is a fundamental yet challenging task in computer vision, especially when dealing with arbitrarily-oriented texts. Most contemporary text detection methods are designed to identify horizontal or approximately horizontal text, which cannot satisfy practical detection requirements for various real-world images such as image streams or videos. To address this lacuna, we propose a novel method called Rotational You Only Look Once (R-YOLO), a robust real-time convolutional neural network (CNN) model to detect arbitrarily-oriented texts in natural image scenes. First, a rotated anchor box with angle information is used as the text bounding box over various orientations. Second, features of various scales are extracted from the input image to determine the probability, confidence, and inclined bounding boxes of the text. Finally, Rotational Distance Intersection over Union Non-Maximum Suppression is used to eliminate redundancy and acquire detection results with the highest accuracy. Experiments on benchmark comparison are conducted upon four popular datasets, i.e., ICDAR2015, ICDAR2013, MSRA-TD500, and ICDAR2017-MLT. The results indicate that the proposed R-YOLO method significantly outperforms state-of-the-art methods in terms of detection efficiency while maintaining high accuracy; for example, the proposed R-YOLO method achieves an F-measure of 82.3% at 62.5 fps with 720 p resolution on the ICDAR2015 dataset.

Journal Article

Share this book

Add to My Shelf

DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection

by Dinh, My-Tham , Choi, Deok-Jai , Lee, Guee-Sang in Algorithms , Benchmarking , dense adjacent text

2023

Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods.

Journal Article

Share this book

Add to My Shelf

LICS: Locating Inter-Character Spaces for Multilingual Scene Text Detection

by Su, Po-Chyi , Chen, Li-Zhu , Han, Chih-Hung in Annotations , character recognition , Datasets

2025

Scene text detection in multilingual environments poses significant challenges. Traditional detection methods often struggle with language-specific features and require extensive annotated training data for each language, making them less practical for multilingual contexts. The diversity of character shapes, sizes, and orientations in natural scenes, along with text deformation and partial occlusions, further complicates the task of detection. This paper introduces LICS (Locating Inter-Character Spaces), a method that detects inter-character gaps as language-agnostic structural cues, enabling more feasible multilingual text detection. A two-stage approach is employed: first, we train on synthetic data with precise character gap annotations, and then apply weakly supervised learning to real-world datasets with word-level labels. The weakly supervised learning framework eliminates the need for character-level annotations in target languages, substantially reducing the annotation burden while maintaining robust performance. Experimental results on the ICDAR and Total-Text benchmarks demonstrate the strong performance of LICS, particularly on Asian scripts. We also introduce CSVT (Character-Labeled Street View Text), a new scene-text dataset comprising approximately 20,000 carefully annotated streetscape images. A set of standardized labeling principles is established to ensure consistent annotation of text locations, content, and language types. CSVT is expected to facilitate more advanced research and development in multilingual scene-text analysis.

Journal Article

Share this book

Add to My Shelf

A Robot Object Recognition Method Based on Scene Text Reading in Home Environments

by Liu, Shuhua , Hou, Kun , Li, Qi in complex scenes , robot object recognition , scene text detection

2021

With the aim to solve issues of robot object recognition in complex scenes, this paper proposes an object recognition method based on scene text reading. The proposed method simulates human-like behavior and accurately identifies objects with texts through careful reading. First, deep learning models with high accuracy are adopted to detect and recognize text in multi-view. Second, datasets including 102,000 Chinese and English scene text images and their inverse are generated. The F-measure of text detection is improved by 0.4% and the recognition accuracy is improved by 1.26% because the model is trained by these two datasets. Finally, a robot object recognition method is proposed based on the scene text reading. The robot detects and recognizes texts in the image and then stores the recognition results in a text file. When the user gives the robot a fetching instruction, the robot searches for corresponding keywords from the text files and achieves the confidence of multiple objects in the scene image. Then, the object with the maximum confidence is selected as the target. The results show that the robot can accurately distinguish objects with arbitrary shape and category, and it can effectively solve the problem of object recognition in home environments.

Journal Article

Share this book

Add to My Shelf

Text Detection Using Multi-Stage Region Proposal Network Sensitive to Text Scale

by Omachi, Shinichiro , Miyazaki, Tomo , Nagaoka, Yoshito in convolutional neural networks , multiple scales , scene text detection

2021

Recently, attention has surged concerning intelligent sensors using text detection. However, there are challenges in detecting small texts. To solve this problem, we propose a novel text detection CNN (convolutional neural network) architecture sensitive to text scale. We extract multi-resolution feature maps in multi-stage convolution layers that have been employed to prevent losing information and maintain the feature size. In addition, we developed the CNN considering the receptive field size to generate proposal stages. The experimental results show the importance of the receptive field size.

Journal Article

Share this book

Add to My Shelf

AncientGlyphNet: an advanced deep learning framework for detecting ancient Chinese characters in complex scene

by Qi, Hengnian , Yang, Hao , Xin, Qiuyi in Accuracy , Artificial Intelligence , Calligraphy

2025

Detecting ancient Chinese characters in various media, including stone inscriptions, calligraphy, and couplets, is challenging due to the complex backgrounds and diverse styles. This study proposes an advanced deep-learning framework for detecting ancient Chinese characters in complex scenes to improve detection accuracy. First, the framework introduces an Ancient Character Haar Wavelet Transform downsampling block (ACHaar), effectively reducing feature maps’ spatial resolution while preserving key ancient character features. Second, a Glyph Focus Module (GFM) is introduced, utilizing attention mechanisms to enhance the processing of deep semantic information and generating ancient character feature maps that emphasize horizontal and vertical features through a four-path parallel strategy. Third, a Character Contour Refinement Layer (CCRL) is incorporated to sharpen the edges of characters. Additionally, to train and validate the model, a dedicated dataset was constructed, named Huzhou University-Ancient Chinese Character Dataset for Complex Scenes (HUSAM-SinoCDCS), comprising images of stone inscriptions, calligraphy, and couplets. Experimental results demonstrated that the proposed method outperforms previous text detection methods on the HUSAM-SinoCDCS dataset, with accuracy improved by 1.36–92.84%, recall improved by 2.24–85.61%, and F1 score improved by 1.84–89.08%. This research contributes to digitizing ancient Chinese character artifacts and literature, promoting the inheritance and dissemination of traditional Chinese character culture. The source code and the HUSAM-SinoCDCS dataset can be accessed at https://github.com/youngbbi/AncientGlyphNet and https://github.com/youngbbi/HUSAM-SinoCDCS .

Journal Article

Share this book

Add to My Shelf

A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild

by He, Xuan , Li, Mengyao , Yuan, Jin in Adaptation , Cognitive tasks , Regularization

2023

Scene text detection has drawn increasing attention due to its potential scalability to large-scale applications. Currently, a well-trained scene text detection model on a source domain usually has unsatisfactory performance when it is migrated to e large domain shift between them. To bridge this gap, this paper proposes a novel network integrates both text-specific Faster R-CNN (ts-FRCNN) and domain adaptation (ts-DA) into one framework. Compared to conventional FRCNN, ts-FRCNN designs a text-specific RPN to generate more accurate region proposals by considering the inherent characters of scene text, as well as text-specific RoI pooling to extract purer and sufficient fine-grained text features by adopting an adaptive asymmetric griding strategy. Compared to conventional domain adaptation, ts-DA adopts a triple-level alignment strategy to reduce the domain shift at the image, word and character levels, and builds a triple-consistency regularization among them, which significantly promotes domain-invariant text feature learning. We conduct extensive experiments on three representative transfer learning tasks: common-to-extreme scenes, real-to-real scenes and synthetic-to-real scenes. The experimental results demonstrate that our model consistently outperforms the previous methods.

Journal Article

Share this book

Add to My Shelf

Scene Text Detection in Natural Images: A Review

by Dang, Jiachen , He, Yilong , Cao, Dongping

2020

Scene text detection is attracting more and more attention and has become an important topic in machine vision research. With the development of mobile IoT (Internet of things) and deep learning technology, text detection research has made significant progress. This survey aims to summarize and analyze the main challenges and significant progress in scene text detection research. In this paper, we first introduce the history and progress of scene text detection and classify the traditional methods and deep learning-based methods in detail, pointing out the corresponding key issues and techniques. Then, we introduce commonly used benchmark datasets and evaluation protocols and identify state-of-the-art algorithms by comparison. Finally, we summarize and predict potential future research directions.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter