Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectCountry Of PublicationPublisherSourceTarget AudienceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
20
result(s) for
"Kannada language Texts"
Sort by:
Study of automatic text summarization approaches in different languages
2021
Nowadays we see huge amount of information is available on both, online and offline sources. For single topic we see hundreds of articles are available, containing vast amount of information about it. It is really a difficult task to manually extract the useful information from them. To solve this problem, automatic text summarization systems are developed. Text summarization is a process of extracting useful information from large documents and compressing them into short summary preserving all important content. This survey paper hand out a broad overview on the work done in the field of automatic text summarization in different languages using various text summarization approaches. The focal centre of this survey paper is to present the research done on text summarization on Indian languages such as, Hindi, Punjabi, Bengali, Malayalam, Kannada, Tamil, Marathi, Assamese, Konkani, Nepali, Odia, Sanskrit, Sindhi, Telugu and Gujarati and foreign languages such as Arabic, Chinese, Greek, Persian, Turkish, Spanish, Czeh, Rome, Urdu, Indonesia Bhasha and many more. This paper provides the knowledge and useful support to the beginner scientists in this research area by giving a concise view on various feature extraction methods and classification techniques required for different types of text summarization approaches applied on both Indian and non-Indian languages.
Journal Article
Use of prompt-based learning for code-mixed and code-switched text classification
by
Udawatta, Pasindu
,
Gamage, Chathulanka
,
Shekhar, Ravi
in
Classification
,
Digital media
,
English language
2024
Code-mixing and code-switching (CMCS) are prevalent phenomena observed in social media conversations and various other modes of communication. When developing applications such as sentiment analysers and hate-speech detectors that operate on this social media data, CMCS text poses challenges. Recent studies have demonstrated that prompt-based learning of pre-trained language models outperforms full fine-tuning across various tasks. Despite the growing interest in classifying CMCS text, the effectiveness of prompt-based learning for the task remains unexplored. This paper presents an extensive exploration of prompt-based learning for CMCS text classification and the first comprehensive analysis of the impact of the script on classifying CMCS text. Our study reveals that the performance in classifying CMCS text is significantly influenced by the inclusion of multiple scripts and the intensity of code-mixing. In response, we introduce a novel method, Dynamic+AdapterPrompt, which employs distinct models for each script, integrated with adapters. While DynamicPrompt captures the script-specific representation of the text, AdapterPrompt emphasizes capturing the task-oriented functionality. Our experiments on Sinhala-English, Kannada-English, and Hindi-English datasets for sentiment classification, hate-speech detection, and humour detection tasks show that our method outperforms strong fine-tuning baselines and basic prompting strategies.
Journal Article
Sentiment analysis of code-mixed Dravidian languages leveraging pretrained model and word-level language tag
2025
The exponential growth of social media data in the era of Web 2.0 has necessitated advanced techniques for sentiment analysis. While sentiment analysis in monolingual datasets has received significant attention that in code-mixed datasets still need to be studied more. Code-mixed data often contain a mixture of monolingual content (might be in transliterated form), single-script but multilingual content, and multi-script multilingual content. This paper explores the issue from three important angles. What will be the best strategy to deal with the data for sentiment detection? Whether to train the classifier with the whole of the dataset or only with the pure code-mixed subset from the dataset? How much important is the language identification (LID) for the task? If LID is to be done, how, and when will it be used to yield the best performance? We explore the questions in the light of three datasets of Tamil–English, Kannada–English, and Malayalam–English YouTube social media comments. Our solution incorporated mBERT and an optional LID module. We report our results using a set of metrics like precision, recall,$F_1$score, and accuracy. The solutions provide considerable performance gain and some interesting insights for sentiment analysis from code-mixed data.
Journal Article
Design and development of Dogri extractive summarization model for automated summary generation
by
Arora, Bhavna
,
Gandotra, Sonam
,
Kumar, Yogesh
in
Algorithms
,
Automatic summarization
,
Automation
2025
Text summarization is an important method that compresses massive amounts of information into clear, succinct summaries that make it easier to grasp and extract knowledge. Text summarization tasks can be broadly divided into two types: extractive and abstractive. In this paper, the task of Extractive Summarization for Dogri language is taken up. The goal of Extractive Summarization is to extract key phrases from text into a meaningful form. The Dogri Extractive Summarization Model has been presented in this paper. Statistical features comprising of sentence-level and word-level features are employed for extracting important sentences from the given document. Word-level features include presence of common noun, proper noun, numerical information, and term frequency-inverse sentence frequency (TF-ISF) whereas sentence-level features include sentence position, sentence length and similarity to news title. A linear combination of all these features score is used to form the final score of the sentence. The ranking of sentences is then done according to the generated score and final summary is generated according to the compression ratio. In this paper, the results for five compression ratios i.e., 70%, 50%, 30%, 20% and 10% has been shown for different Rouge scores i.e., Rouge-1, Rouge-2 and Rouge-L. Also, a comparative analysis of the proposed Dogri Extractive Summarization model with other Indian Text Summarization systems like Hindi, Bengali, Punjabi, and Kannada is also presented in the paper.
Journal Article
Cross-Lingual Short-Text Semantic Similarity for Kannada–English Language Pair
by
S N, Muralikrishna
,
Holla, Raghurama
,
Ganiga, Raghavendra
in
Algorithms
,
Analysis
,
Annotations
2024
Analyzing the semantic similarity of cross-lingual texts is a crucial part of natural language processing (NLP). The computation of semantic similarity is essential for a variety of tasks such as evaluating machine translation systems, quality checking human translation, information retrieval, plagiarism checks, etc. In this paper, we propose a method for measuring the semantic similarity of Kannada–English sentence pairs that uses embedding space alignment, lexical decomposition, word order, and a convolutional neural network. The proposed method achieves a maximum correlation of 83% with human annotations. Experiments on semantic matching and retrieval tasks resulted in promising results in terms of precision and recall.
Journal Article