Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
83
result(s) for
"Text Rank"
Sort by:
Impact analysis of keyword extraction using contextual word embedding
by
Uddin, M. Irfan
,
Alharbi, Abdullah
,
Shahid, Abdul
in
Algorithms
,
Analysis
,
Artificial Intelligence
2022
A document’s keywords provide high-level descriptions of the content that summarize the document’s central themes, concepts, ideas, or arguments. These descriptive phrases make it easier for algorithms to find relevant information quickly and efficiently. It plays a vital role in document processing, such as indexing, classification, clustering, and summarization. Traditional keyword extraction approaches rely on statistical distributions of key terms in a document for the most part. According to contemporary technological breakthroughs, contextual information is critical in deciding the semantics of the work at hand. Similarly, context-based features may be beneficial in the job of keyword extraction. For example, simply indicating the previous or next word of the phrase of interest might be used to describe the context of a phrase. This research presents several experiments to validate that context-based key extraction is significant compared to traditional methods. Additionally, the KeyBERT proposed methodology also results in improved results. The proposed work relies on identifying a group of important words or phrases from the document’s content that can reflect the authors’ main ideas, concepts, or arguments. It also uses contextual word embedding to extract keywords. Finally, the findings are compared to those obtained using older approaches such as Text Rank, Rake, Gensim, Yake, and TF-IDF. The Journals of Universal Computer (JUCS) dataset was employed in our research. Only data from abstracts were used to produce keywords for the research article, and the KeyBERT model outperformed traditional approaches in producing similar keywords to the authors’ provided keywords. The average similarity of our approach with author-assigned keywords is 51%.
Journal Article
Exploring the Path of Judicial Big Data to Enhance Data Governance Capability
2024
In ample data justice, predicting legal outcomes and identifying similar cases hold significant value. This paper presents an advanced legal prediction algorithm that integrates the specific features of legal texts. Utilizing the Text Rank model, it extracts essential text features from legal provisions and facts, enabling the precise deployment of legal requirements based on detailed case analyses and legal knowledge. To overcome the hurdles of scant training data and the challenge of distinguishing similar legal documents, we developed a similar case matching model employing twin Bert encoders. Our empirical study reveals theft, intentional injury, and fraud as the predominant crimes, with sample counts of 335,745, 174,526, and 47,677, respectively. These top offenses, correlating with the most frequently cited laws, account for 85.79% of our dataset. The analysis further indicates “RMB” as the most recurring word in theft and fraud cases, and “minor injury” in intentional injury instances. Notably, our findings show that categories such as “misappropriation” are prone to misclassification as “embezzlement,” and “robbery” often gets confused with “theft,” highlighting the complexities of legal classification.
Journal Article
A Hybrid Approach for Automatic Text Summarization by Handling Out-of-Vocabulary Words Using TextR-BLG Pointer Algorithm
2024
Long documents such as scientific papers and government reports, often discuss substantial issues in long conversation, which are time-consuming to read and understand. Generating abstractive summaries can help readers quickly grasp the main topics, yet prior work has mostly focused on short texts and also has some of the drawbacks such as out-of-vocabulary (OOV), mismatched sentences and meaning less summary. Hence, to overcome these issues the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm. In this designed model, the long document is given as an input for automatic text summarization and are evaluated for the word frequency length, based on the threshold value the sentences are split into extractive and abstractive approach. The text algorithm used in this model finds out the sentence similarity score and is validated with the plotted graph. Likewise, the sentences above the threshold level are considered as the abstractive approach. The optimized BERT-LSTM-BiGRU (BLG) based pointer algorithm is used for learning the meaning from the sentences by word embedding, encoding and decoding the hidden states. Finally, the reframed sentences are considered for attaining the similarity score and plotting graph. The sentences from the abstractive and extractive approach are ranked based on the plotted graph for generating the summary. For evaluating the performance of the model, the ROUGE 1, ROUGE 2, ROUGE L, Bert Score, Bleu Score, and Meteor Score are 59.2, 58.4, 62.3, 0.92, 0.78, and 0.67, which are compared with the existing model. From the evaluation of proposed and existing summarization techniques, the proposed model attains better text summarization than the existing model. Thus, the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm performs better by handling out-of-vocabulary words than the existing text summarization techniques.
Journal Article
Generating Summaries Through Unigram and Bigram: Text Summarization
by
Alsharman, Nesreen Mohammad
,
Pivkina, Inna V
in
Computational linguistics
,
Language processing
,
Natural language interfaces
2020
This article describes a new method for generating extractive summaries directly via unigram and bigram extraction techniques. The methodology uses the selective part of speech tagging to extract significant unigrams and bigrams from a set of sentences. Extracted unigrams and bigrams along with other features are used to build a final summary. A new selective rule-based part of speech tagging system is developed that concentrates on the most important parts of speech for summarizations: noun, verb, and adjective. Other parts of speech such as prepositions, articles, adverbs, etc., play a lesser role in determining the meaning of sentences; therefore, they are not considered when choosing significant unigrams and bigrams. The proposed method is tested on two problem domains: citations and opinosis data sets. Results show that the proposed method performs better than Text-Rank, LexRank, and Edmundson summarization methods. The proposed method is general enough to summarize texts from any domain.
Journal Article
A Novel Method to Detect Public Health in Online Social Network Using Graph-based Algorithm
by
Devika, R.
,
Subramaniyaswamy, V.
,
Sinduja, S.
in
Algorithms
,
Epidemics
,
Information dissemination
2019
INTRODUCTION: Twitter has played an important role in the social life of people. The health-related tweets are extracted and find the spread of epidemic disease on network. It can provide as a starting place of individual data to learn the physical condition of users.OBJECTIVES: Key objective is to develop graph-based algorithm to detect public health in online social network.METHODS: The proposed method collect the tweets relating to general health in twitter using the min-cut algorithm. The algorithm finds the minimum cut off an undirected edge-weighted graph. The runtime of the algorithm seems to be faster than other graph algorithms. Min-cut is reliable and good in network optimization and prevents redundancy.RESULTS: To evaluate the performance, we utilize the health dataset on the detection of epidemic disease. The proposed method using a graph-based algorithm is the best in terms of accuracy, precision, and recall. With respect to the confusion matrix, Min-cut provides the highest true positive when compared to Text rank and K-Means algorithm.CONCLUSION: Proposed health detection method using graph-based algorithm is better than Text Rank and K-Means in all aspects.
Journal Article
A Three-stage multimodal emotion recognition network based on text low-rank fusion
by
Yang, Youlong
,
Ning, Tong
,
Zhao, Linlin
in
Ablation
,
Audio data
,
Computer Communication Networks
2024
Multimodal emotion recognition has achieved good results in emotion recognition tasks by fusing multimodal information such as audio, text, and visual. How to use multimodal interaction and fusion to transform sparse unimodal into compact multimodal has become a vital research hotspot in multimodal emotion recognition. However, in multimodality, the extracted unimodal information needs to be representative. The multimodal fusion will cause the loss of feature information, which creates a particular challenge for multimodal emotion recognition. To address these problems, this paper proposes a three-stage multimodal emotion recognition network based on text low-rank fusion by extracting unimodal features, combining bimodal features, and fusing multimodal features. Specifically, we introduce a Residual-based Attention Mechanism for the first feature extraction stage, which can filter out redundant information and extract valuable unimodal information. Then, we use the Cross-modal Transformer to complete the inter-modal interaction. Finally, we introduce a Text-based Low-rank Fusion Module that enhances multimodal fusion by leveraging the complementarity between different modalities, ensuring comprehensive fused features. The accuracy of the proposed model on CMU-MOSEI, CMU-MOSI, and IEMOCAP datasets is 82.1%, 80.8%, and 83.0%, respectively. Meanwhile, many ablation experiments are conducted in this paper to verify the effectiveness and generalization of the model.
Journal Article
Unveiling What Is Written in the Stars
2017
Deciphering consumers’ sentiment expressions from big data (e.g., online reviews) has become a managerial priority to monitor product and service evaluations. However, sentiment analysis, the process of automatically distilling sentiment from text, provides little insight regarding the language granularities beyond the use of positive and negative words. Drawing on speech act theory, this study provides a fine-grained analysis of the implicit and explicit language used by consumers to express sentiment in text. An empirical text-mining study using more than 45,000 consumer reviews demonstrates the differential impacts of activation levels (e.g., tentative language), implicit sentiment expressions (e.g., commissive language), and discourse patterns (e.g., incoherence) on overall consumer sentiment (i.e., star ratings). In two follow-up studies, we demonstrate that these speech act features also influence the readers’ behavior and are generalizable to other social media contexts, such as Twitter and Facebook. We contribute to research on consumer sentiment analysis by offering a more nuanced understanding of consumer sentiments and their implications.
Journal Article
Analysis of the distraction impact on driving performance across driving styles: A driving simulator study in various speed conditions
by
Nassiri, Habibollah
,
Faqani, Mobina
,
Ramezani, Mohsen
in
Acceleration
,
Accidents, Traffic
,
Adult
2025
Distracted driving is a mounting global issue, prompting numerous naturalistic and simulator-based investigations. This study investigates the impact of hands-free (HF) conversation and texting distractions on driving performance during car-following experiments. Three experiments were designed: a baseline (control) condition, HF conversation, and text messaging. Driving data were collected from 40 participants of driving simulator experiments, conducted under six different speed conditions: (i) free-flow, (ii) coherent moving flow, (iii) synchronized flow, (iv) jam density, (v) recovery from jam density, and (vi) collision avoidance. To analyze driving performance across various mobile phone distracted driving (MPDD) experiments, participants are partitioned into three distinct groups: aggressive, moderate, and conservative, based on their driving styles using k-means clustering. Statistical analyses, including t-tests, Friedman Test, and Wilcoxon Signed-Rank Test, were conducted to evaluate driving performance metrics such as Standard Deviation of Lateral Position (SDLP) across conditions (i)-(iv), Acceleration Reaction Time (ART) in condition (v), and Time to Initial Braking Location (TIBL) in condition (vi). The findings indicated that HF conversation had no effect on SDLP in the free-flow condition. However, it led to a reduction in SDLP for the conservative group in the coherent moving flow condition, for both moderate and conservative groups in the synchronized flow condition, and for the moderate group in the jam density condition. Additionally, HF conversation was associated with a decrease in ART among conservative participants, while it significantly increased TIBL for both moderate and conservative groups. Conversely, texting led to an increase in SDLP for moderate and conservative participants in the free-flow condition and for the moderate group in the coherent moving flow condition. However, it resulted in a reduction in SDLP for the conservative group in the coherent moving flow condition. Texting had no significant effect on SDLP in the jam density condition or on ART. However, it significantly increased TIBL among moderate and conservative participants. These findings can inform legislation, policy development, countermeasures, and future research.
Journal Article
Enhancing product concept image generation through semantic feature prompts and LoRA training
2025
This paper proposes an innovative strategy that integrates fine-grained semantic feature decoding with Low-Rank Adaptation (LoRA) fine-tuning model training to significantly improve the performance of text-to-image technology, addressing the limitations of current Generative Artificial Intelligence (GAI) in product conceptual image design. Firstly, semantic information pertinent to product design is collected, and the E-Prime software is utilized to conduct a semantic priming task for extracting key semantic words. Subsequently, the DeepSeek prompt engineering method is employed to decode the fine-grained features of semantic words sequentially from abstract to concrete based on the three dimensions of mental image, functional image, and physical image. Semantic feature prompts are derived by expert evaluation and clustering methods. Finally, the LoRA technique is employed to train the dataset independently based on the semantic feature prompts, achieving the optimal model configuration. Taking the intelligent pulse diagnostic instrument as an example, the application of this strategy in product conceptual design is demonstrated. Furthermore, multi-dimensional assessments of text-to-image outcomes are conducted through comparative experiments, verifying the potential and efficacy of the proposed strategy, which provides a solution for the controlled generation of large models in product design applications.
Journal Article
Analysis of In text Citation Patterns in Local Journals for Ranking Scientific Documents
2021
In-text citations have been put forward as a new way to overcome the bias inherent in bibliographic citation analysis. In-text citation patterns have been used as the basis for citation analysis previously, but all the evidence has come from international journals. However, many countries have more local journals than international journals. This paper uses in-text citation analysis to examine local journals in Indonesia. The paper aims to determine the location-based citation pattern in the text and its effect on the articles’ and authors’ rankings. We collected articles from seven food science journals and then parsed these articles to detect the citations and their locations within the text. Pre-processing included normalizing section names, developing a database, and matching citation identities. The rankings were based on sections and then evaluated using the Spearman rank correlation in the final step. The results revealed that Indonesian journals did not exhibit the same patterns as international journals. There were differences in the section locations of the highest percentages of citations, the distributions of publication years, and the ranking methods. The correlations between sections indicated that the citations in the results and discussion section should be give the highest weight, followed by those in the method section, while the lowest weight should be assigned to citations in the introduction. These results need to be strengthened with further research using more extensive data and fields. Other findings, such as nonstandard and inconsistent citations, made developing an automatic citation detection system for local journals challenging.
Journal Article