Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectCountry Of PublicationPublisherSourceTarget AudienceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
1,158
result(s) for
"Tamil language Texts"
Sort by:
Study of automatic text summarization approaches in different languages
2021
Nowadays we see huge amount of information is available on both, online and offline sources. For single topic we see hundreds of articles are available, containing vast amount of information about it. It is really a difficult task to manually extract the useful information from them. To solve this problem, automatic text summarization systems are developed. Text summarization is a process of extracting useful information from large documents and compressing them into short summary preserving all important content. This survey paper hand out a broad overview on the work done in the field of automatic text summarization in different languages using various text summarization approaches. The focal centre of this survey paper is to present the research done on text summarization on Indian languages such as, Hindi, Punjabi, Bengali, Malayalam, Kannada, Tamil, Marathi, Assamese, Konkani, Nepali, Odia, Sanskrit, Sindhi, Telugu and Gujarati and foreign languages such as Arabic, Chinese, Greek, Persian, Turkish, Spanish, Czeh, Rome, Urdu, Indonesia Bhasha and many more. This paper provides the knowledge and useful support to the beginner scientists in this research area by giving a concise view on various feature extraction methods and classification techniques required for different types of text summarization approaches applied on both Indian and non-Indian languages.
Journal Article
Hybrid model for classifying Indo Aryan and Tamil texts from historic manuscripts
2025
The ancient manuscripts, especially Indo-Aryan and Tamil texts have the complex linguistic structure in their manuscripts with historical differences. The present paper is a BERT-Li Scribe Hybrid Model of the historic Indo-Aryan and Tamil manuscript classification. The model combines the contextual embedding of BERT, which learns the semantic links in the text, and LiScribe, a specialized sequence model that learns features, and linguistic patterns of Indo-Aryan and Tamil scripts at a character level. The sample of 1,055 manuscripts of the Library of Congress and University of Hamburg is a perfect combination of Indo-Aryan and Tamil texts. The model that is offered provides the classification of two large groups, such as Indo-Aryan (Hindi, Bengali, Marathi, Gujarati) and Tamil, which guarantees the level of classification on a family level and specific scripts. Training is performed using categorical cross-entropy loss and Adam optimizer with learning rate scheduling with dropout layers used to avoid overfitting with noisy historical data. The model, which was coded in Python and deployed with the help of such libraries as TensorFlow and PyTorch, demonstrated a high overall classification accuracy of 97.61%, being able to distinguish between the Indo-Aryan and Tamil texts at the same time. The attention mechanism also increases the concentration on the important features by the model even in the degraded manuscripts. This mixed methodology proves the usefulness of the combination of deep learning and linguistic feature extraction to the correct classification of historical manuscripts.
Journal Article
On whose tongue will the goddess write, in whose tongue will the state speak? Mathematics education, Tamil language, and the caste question in India
by
Subramanian, Jayasree
,
Visawanathan, Venkateswaran T
in
Ancient languages
,
Black Power movement
,
Boards of Education
2023
Mathematics education in India is offered in one of the 22 officially recognized state languages or in English even though there are at least 270 languages with more than 10,000 speakers each. Caste, a deep-rooted structure that stratifies Indian society, is integrally linked to shaping state languages. There is minimal research from India that looks at language and mathematics education and practically none that factors in caste. Focusing on Tamil Nadu, a state with a history of anti-caste movement on the one hand and pure Tamil movement (a movement that sought to create a Tamil language with no words from other languages) on the other, this conceptual paper seeks to explore this dimension. More specifically, by using caste as an analytical framework, and by drawing on examples from the mathematics textbooks published by the Tamil Nadu State Board of Education and the experience of a few teachers and learners, the paper seeks to make a theoretical argument that the use of pure Tamil in mathematics textbooks has negative implication for socio-culturally and economically marginalized students who are solely dependent on textbooks as the only source for learning mathematics. There is a strong need for carrying out empirical work that would highlight the nuances and complexities involved in realizing ‘mother tongue’ education in mathematics, particularly for those who belong to marginalized caste-class backgrounds, and we hope that such work would emerge in the future.
Journal Article
SafeSpeech: a three-module pipeline for hate intensity mitigation of social media texts in Indic languages
2025
Warning: This paper contains some abusive text that might sometimes be found offensive.Identifying and mitigating hateful, abusive, offensive comments on social media is a crucial, paramount task. It’s challenging to entirely prevent such hateful content and impose rigorous censorship on social platforms while safeguarding free speech. Recent studies have focused on detecting hate speech, whereas mitigating the intensity of hate remains unexplored or somewhat complex. This paper introduces a cost-effective, straightforward, and novel three-module pipeline, SafeSpeech, for Hate Speech Classification, Hate Intensity Identification, and Hate Intensity Mitigation on social media texts. The initial module classifies text as either containing or not containing hate speech. Following this, the second module quantifies the intensity of hate associated with individual words within the classified hate speech. Lastly, the third module seeks to diminish the overall hatefulness conveyed in the text. A comprehensive experiment has been conducted using publicly available datasets in five Indic languages (Hindi, Marathi, Tamil, Telugu, and Bengali). The system undergoes thorough evaluation to assess its performance and analyze it in-depth using various automated metrics. We evaluated the performance of Hate Speech Classification using Precision, Recall, and F1 metrics. For Hate Intensity Identification, we use human evaluation. Recognizing the limitations of automated metrics in mitigating hate speech, we augment our experiments with human evaluation for Hate Intensity Mitigation, where three domain experts independently participated. BERTScore for final generated hate-mitigated texts and first classified hate texts across all languages consistently range between 0.96 and 0.99.
Journal Article
Scene text recognition: an Indic perspective
by
Vijayan, Vasanthan P
,
Doermann, David
,
Krishnan, Narayanan C
in
Accuracy
,
Classification
,
Data augmentation
2025
Exploring Scene Text Recognition (STR) in Indian languages is an important research domain due to its wide applications. This paper proposes a spatial attention-based model (LaSA-Net) that combines visual features and language knowledge for word recognition from scene image word segments. We augment the classical cross-entropy loss with a novel language-attunement loss that enables the model to learn valid and prevalent character sequences in the word. This enhances the model’s ability to perform zero-shot word recognition. Further, to compensate for the lack of rotational invariance in CNN based feature extraction backbone, we propose a training data augmentation strategy involving the creation of glyphs: images of individual characters of different orientations. This improves LaSA-Net’s ability to recognize words in images with curved/vertically aligned text, alleviating the need for computationally expensive preprocessing modules. Our experiments with Tamil, Malayalam, and Telugu scripts on the IIIT-ILST datasets have achieved new benchmark results and outperformed other state-of-the-art STR models.
Journal Article