Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Structural–Semantic Term Weighting for Interpretable Topic Modeling with Higher Coherence and Lower Token Overlap

by Konnikov, Evgenii , Yakob, Polina , Golikov, Gleb , Rodionov, Dmitriy

in Bibliometrics / Coherence / coherence value / Data mining / Embedding / Large language models / large sparse text corpora / Latent Dirichlet Allocation (LDA) / Linear algebra / Modelling / News / Qualitative analysis / Russian language / Semantics / structural–semantic term weighting / Subject specialists / topic modeling / Weighting

2026

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

Structural–Semantic Term Weighting for Interpretable Topic Modeling with Higher Coherence and Lower Token Overlap

by Konnikov, Evgenii , Yakob, Polina , Golikov, Gleb , Rodionov, Dmitriy

2026

Confirm

Do you wish to request the book?

Structural–Semantic Term Weighting for Interpretable Topic Modeling with Higher Coherence and Lower Token Overlap

by Konnikov, Evgenii , Yakob, Polina , Golikov, Gleb , Rodionov, Dmitriy

2026

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

Structural–Semantic Term Weighting for Interpretable Topic Modeling with Higher Coherence and Lower Token Overlap

Konnikov, Evgenii,

Yakob, Polina,

Golikov, Gleb,

Rodionov, Dmitriy

2026

Overview

Topic modeling of large news streams is widely used to reconstruct economic and political narratives, which requires coherent topics with low lexical overlap while remaining interpretable to domain experts. We propose TF-SYN-NER-Rel, a structural–semantic term weighting scheme that extends classical TF-IDF by integrating positional, syntactic, factual, and named-entity coefficients derived from morphosyntactic and dependency parses of Russian news texts. The method is embedded into a standard Latent Dirichlet Allocation (LDA) pipeline and evaluated on a large Russian-language news corpus from the online archive of Moskovsky Komsomolets (over 600,000 documents), with political, financial, and sports subsets obtained via dictionary-based expert labeling. For each subset, TF-SYN-NER-Rel is compared with standard TF-IDF under identical LDA settings, and topic quality is assessed using the C_v coherence metric. To assess robustness, we repeat model training across multiple random initializations and report aggregate coherence statistics. Quantitative results show that TF-SYN-NER-Rel improves coherence and yields smoother, more stable coherence curves across the number of topics. Qualitative analysis indicates reduced lexical overlap between topics and clearer separation of event-centered and institutional themes, especially in political and financial news. Overall, the proposed pipeline relies on CPU-based NLP tools and sparse linear algebra, providing a computationally lightweight and interpretable complement to embedding- and LLM-based topic modeling in large-scale news monitoring.

Share this book

Add to My Shelf

Publisher

MDPI AG

Subject

/ Large language models

/ large sparse text corpora