Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
4,999
result(s) for
"Natural Language Processing (NLP)"
Sort by:
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
by
Shu-Ching Yang
,
Liang-Ching Chen
,
Kuei-Hu Chang
in
Information retrieval
,
Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military
,
Natural language processing
2022
Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.
Journal Article
Understanding Students’ Perception of Sustainability: Educational NLP in the Analysis of Free Answers
by
Hiroko Yamano
,
Nathan Hyungsok Choe
,
John Jongho Park
in
Beliefs, opinions and attitudes
,
College students
,
Computational linguistics
2022
This study explored undergraduate students’ conceptions of sustainable development by asking about their definition of a sustainable world, current issues of sustainable development, and the necessary mindset and skillsets to build a sustainable world. We derived data from 107 participants’ open-ended answers that we collected through an online survey at the beginning and the end of the sustainability class. Text mining with Natural Language Processing (NLP), principal component analysis (PCA), and co-occurrence network analysis were conducted to understand the changes in students’ conception of sustainable development. In addition, we also conducted the Linguistic Inquiry and Word Count (LIWC) dictionary to investigate the psychometric properties of students’ awareness and understanding related to sustainable development. This advanced analysis technique provided a rich understanding of university students’ perceptions of sustainable development compared to what the UN initially defined as sustainable development goals (SDGs). The results showed imperative insights into the benefits of sustainability experiences and knowledge that generate motivation to develop students’ competencies as change agents.
Journal Article
EFFECTIVENESS OF ZERO-SHOT MODELS IN AUTOMATIC ARABIC POEM GENERATION
2023
Text generation is one of the most challenging applications in artificial intelligence and natural-language processing. In recent years, text generation has gained much attention thanks to the advances in deep-learning and language-modeling approaches. However, writing poetry is a challenging activity for humans that necessitates creativity and a high level of linguistic ability. Therefore, automatic poem generation is an important research issue that has attracted the interest of the Natural Language Processing (NLP) community. Several researchers have examined automatic poem generation using deep-learning approaches, but little has focused on Arabic poetry. In this work, we exhibit how we utilize various GPT-2 and GPT-3 models to automatically generate Arabic poems. BLEU scores and human evaluation are used to evaluate the results of four GPT-based models. Both BLEU scores and human evaluations indicate that fine-tuned GPT-2 outperforms GPT-3 and fine-tuned GPT-3 models, with GPT-3 model having the lowest value in terms of poeticness. To the best of the authors’ knowledge, this work is the first in literature that employs and fine-tunes GPT-3 to generate Arabic poems.
Journal Article
Sustainable Urban Green Blue Space (UGBS) and Public Participation: Integrating Multisensory Landscape Perception from Online Reviews
2023
The integration of multisensory-based public subjective perception into planning, management, and policymaking is of great significance for the sustainable development and protection of UGBS. Online reviews are a suitable data source for this issue, which includes information about public sentiment, perception of the physical environment, and sensory description. This study adopts the deep learning method to obtain effective information from online reviews and found that in 105 major sites of Tokyo (23 districts), the public overall perception level is not balanced. Rich multi-sense will promote the perception level, especially hearing and somatosensory senses that have a higher positive prediction effect than vision, and overall perception can start improving by optimizing these two senses. Even if only one adverse sense exists, it will seriously affect the perception level, such as bad smell and noise. Optimizing the physical environment by adding natural elements for different senses is conducive to overall perception. Sensory maps can help to quickly find areas that require improvement. This study provides a new method for rapid multisensory analysis and complementary public participation for specific situations, which helps to increase the well-being of UGBS and give play to its multi-functionality.
Journal Article
Local food experiences before and after COVID-19: a sentiment analysis of EWOM
by
GASSIOT-MELIAN, Ariadna
,
POYOI, Pimsuporn
,
COROMINA, Lluis
in
COVID-19
,
Decision making
,
gastronomy
2023
Purpose – To use Natural Language Processing (NLP) to explore how people feel and what they share online about their experiences with food. In addition, to learn how these experiences have evolved recently, differences before and during the crisis COVID -19 will be explored. Methodology/Design/Approach – A total of 35,001 reviews of restaurants and local cuisine establishments near tourist attractions in the city of Ayutthaya, Thailand, were extracted from the Google Local Guide platform. Several NLP techniques were used to analyse the text data, including sentiment analysis, word cloud analysis, and the N-gramme model. Findings – The results reveal travellers’ hidden sentiments toward dining experiences. Key attributes of experience sharing related to food activities in online reviews were identified both before and after COVID -19. From a theoretical perspective, the findings are relevant for researchers to recognise tourists’ behaviour in sharing local food experiences. From a practical perspective, decision makers will have a better understanding of tourist behaviour to develop and implement appropriate strategies. Originality of the research – This study is the first to analyse and interpret online reviews on Google Maps platform by applying text mining and sentiment analysis in gastronomic tourism research, especially in the context of COVID -19.
Journal Article
Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining
by
Katsuhiko Ogasawara
,
Feng Han
,
Ziheng Zhang
in
Biology (General)
,
biomedical literature mining (BLM)
,
biomedical literature mining (BLM); natural language processing (NLP); Word2vec
2022
Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, serve as the foundation for various natural language processing (NLP) applications, such as biomedical information retrieval, relation extraction, and recommendation systems. The objective of this study is to examine how changes in the ratio of the biomedical domain to general domain data in the corpus affect the extraction of similar biomedical terms using Word2vec. We downloaded abstracts of 214,892 articles from PubMed Central (PMC) and the 3.9 GB Billion Word (BW) benchmark corpus from the computer science community. The datasets were preprocessed and grouped into 11 corpora based on the ratio of BW to PMC, ranging from 0:10 to 10:0, and then Word2vec models were trained on these corpora. The cosine similarities between the biomedical terms obtained from the Word2vec models were then compared in each model. The results indicated that the models trained with both BW and PMC data outperformed the model trained only with medical data. The similarity between the biomedical terms extracted by the Word2vec model increased when the ratio of the biomedical domain to general domain data was 3:7 to 5:5. This study allows NLP researchers to apply Word2vec based on more information and increase the similarity of extracted biomedical terms to improve their effectiveness in NLP applications, such as biomedical information extraction.
Journal Article
Regularisation of neural networks by enforcing Lipschitz continuity
by
Pfahringer Bernhard
,
Cree, Michael J
,
Frank Eibe
in
Computation
,
Mathematical models
,
Neural networks
2021
We investigate the effect of explicitly enforcing the Lipschitz continuity of neural networks with respect to their inputs. To this end, we provide a simple technique for computing an upper bound to the Lipschitz constant—for multiple p-norms—of a feed forward neural network composed of commonly used layer types. Our technique is then used to formulate training a neural network with a bounded Lipschitz constant as a constrained optimisation problem that can be solved using projected stochastic gradient methods. Our evaluation study shows that the performance of the resulting models exceeds that of models trained with other common regularisers. We also provide evidence that the hyperparameters are intuitive to tune, demonstrate how the choice of norm for computing the Lipschitz constant impacts the resulting model, and show that the performance gains provided by our method are particularly noticeable when only a small amount of training data is available.
Journal Article
The class imbalance problem in deep learning
by
Japkowicz, Nathalie
,
Corizzo, Roberto
,
Krawczyk, Bartosz
in
Artificial Intelligence
,
class imbalance
,
Computer Science
2024
Deep learning has recently unleashed the ability for Machine learning (ML) to make unparalleled strides. It did so by confronting and successfully addressing, at least to a certain extent, the knowledge bottleneck that paralyzed ML and artificial intelligence for decades. The community is currently basking in deep learning’s success, but a question that comes to mind is: have all of the issues previously affecting machine learning systems been solved by deep learning or do some issues remain for which deep learning is not a bulletproof solution? This question in the context of the class imbalance becomes a motivation for this paper. Imbalance problem was first recognized almost three decades ago and has remained a critical challenge at least for traditional learning approaches. Our goal is to investigate whether the tight dependency between class imbalances, concept complexities, dataset size and classifier performance, known to exist in traditional learning systems, is alleviated in any way in deep learning approaches and to what extent, if any, network depth and regularization can help. To answer these questions we conduct a survey of the recent literature focused on deep learning and the class imbalance problem as well as a series of controlled experiments on both artificial and real-world domains. This allows us to formulate lessons learned about the impact of class imbalance on deep learning models, as well as pose open challenges that should be tackled by researchers in this field.
Journal Article
Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods
2021
The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.
Journal Article
Classifier calibration: a survey on how to assess and improve predicted class probabilities
by
Perello-Nieto, Miquel
,
Song, Hao
,
Santos-Rodriguez, Raul
in
Artificial Intelligence
,
Calibration
,
Classification
2023
This paper provides both an introduction to and a detailed overview of the principles and practice of classifier calibration. A well-calibrated classifier correctly quantifies the level of uncertainty or confidence associated with its instance-wise predictions. This is essential for critical applications, optimal decision making, cost-sensitive classification, and for some types of context change. Calibration research has a rich history which predates the birth of machine learning as an academic field by decades. However, a recent increase in the interest on calibration has led to new methods and the extension from binary to the multiclass setting. The space of options and issues to consider is large, and navigating it requires the right set of concepts and tools. We provide both introductory material and up-to-date technical details of the main concepts and methods, including proper scoring rules and other evaluation metrics, visualisation approaches, a comprehensive account of post-hoc calibration methods for binary and multiclass classification, and several advanced topics.
Journal Article