Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
2 result(s) for "human-annotated dataset"
Sort by:
Mix-Lingual Relation Extraction: Dataset and a Training Approach
Relation extraction is a pivotal task within the field of natural language processing, boasting numerous real-world applications. Existing research predominantly centers on monolingual relation extraction or cross-lingual enhancement for relation extraction. However, there exists a notable gap in understanding relation extraction within mix-lingual (or code-switching) scenarios. In these scenarios, individuals blend content from different languages within sentences, generating mix-lingual content. The effectiveness of existing relation extraction models in such scenarios remains largely unexplored due to the absence of dedicated datasets. To address this gap, we introduce the Mix-Lingual Relation Extraction (MixRE) task and construct a human-annotated dataset MixRED to support this task. Additionally, we propose a hierarchical training approach for the mix-lingual scenario named Mix-Lingual Training (MixTrain), designed to enhance the performance of large language models (LLMs) when capturing relational dependencies from mix-lingual content spanning different semantic levels. Our experiments involve evaluating state-of-the-art supervised models and LLMs on the constructed dataset, with results indicating that MixTrain notably improves model performance. Moreover, we investigate the effectiveness of using mix-lingual content as a tool to transfer learned relational dependencies across different languages. Additionally, we delve into factors influencing model performance for both supervised models and LLMs in the novel MixRE task.
NewsSumm: The World’s Largest Human-Annotated Multi-Document News Summarization Dataset for Indian English
The rapid growth of digital journalism has heightened the need for reliable multi-document summarization (MDS) systems, particularly in underrepresented, low-resource, and culturally distinct contexts. However, current progress is hindered by a lack of large-scale, high-quality non-Western datasets. Existing benchmarks—such as CNN/DailyMail, XSum, and MultiNews—are limited by language, regional focus, or reliance on noisy, auto-generated summaries. We introduce NewsSumm, the largest human-annotated MDS dataset for Indian English, curated by over 14,000 expert annotators through the Suvidha Foundation. Spanning 36 Indian English newspapers from 2000 to 2025 and covering more than 20 topical categories, NewsSumm includes over 317,498 articles paired with factually accurate, professionally written abstractive summaries. We detail its robust collection, annotation, and quality control pipelines, and present extensive statistical, linguistic, and temporal analyses that underscore its scale and diversity. To establish benchmarks, we evaluate PEGASUS, BART, and T5 models on NewsSumm, reporting aggregate and category-specific ROUGE scores, as well as factual consistency metrics. All NewsSumm dataset materials are openly released via Zenodo. NewsSumm offers a foundational resource for advancing research in summarization, factuality, timeline synthesis, and domain adaptation for Indian English and other low-resource language settings.