Catalogue Search | MBRL

Studying user income through language, behaviour and affect in social media

by Lampos, Vasileios , Volkova, Svitlana , Bachrach, Yoram in Affect , Age differences , Artificial intelligence

2015

Automatically inferring user demographics from social media posts is useful for both social science research and a range of downstream applications in marketing and politics. We present the first extensive study where user behaviour on Twitter is used to build a predictive model of income. We apply non-linear methods for regression, i.e. Gaussian Processes, achieving strong correlation between predicted and actual user income. This allows us to shed light on the factors that characterise income on Twitter and analyse their interplay with user emotions and sentiment, perceived psycho-demographics and language use expressed through the topics of their posts. Our analysis uncovers correlations between different feature categories and income, some of which reflect common belief e.g. higher perceived education and intelligence indicates higher earnings, known differences e.g. gender and age differences, however, others show novel findings e.g. higher income users express more fear and anger, whereas lower income users express more of the time emotion and opinions.

Journal Article

Share this book

Add to My Shelf

Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective

by Preoţiuc-Pietro, Daniel , Lampos, Vasileios , Tsarapatsanis, Dimitrios in Artificial Intelligence , Classification , Councils

2016

Recent advances in Natural Language Processing and Machine Learning provide us with the tools to build predictive models that can be used to unveil patterns driving judicial decisions. This can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions. This paper presents the first systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content. We formulate a binary classification task where the input of our classifiers is the textual content extracted from a case and the target output is the actual judgment as to whether there has been a violation of an article of the convention of human rights. Textual information is represented using contiguous word sequences, i.e., N-grams, and topics. Our models can predict the court’s decisions with a strong accuracy (79% on average). Our empirical analysis indicates that the formal facts of a case are the most important predictive factor. This is consistent with the theory of legal realism suggesting that judicial decision-making is significantly affected by the stimulus of the facts. We also observe that the topical content of a case is another important feature in this classification task and explore this relationship further by conducting a qualitative analysis.

Journal Article

Share this book

Add to My Shelf

Unsupervised Quality Estimation for Neural Machine Translation

by Specia, Lucia , Fomicheva, Marina , Yankovskaya, Lisa in Annotations , Black boxes , Computation

2020

Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation, and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. Different from most of the current work that treats the MT system as a black box, we explore useful information that can be extracted from the MT system as a by-product of translation. By utilizing methods for uncertainty quantification, we achieve very good correlation with human judgments of quality, rivaling state-of-the-art supervised QE models. To evaluate our approach we collect the first dataset that enables work on both black-box and glass-box approaches to QE.

Journal Article

Share this book

Add to My Shelf

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

by Chrysostomou, George , Williams, Miles , Zhao, Zhixue in Documents , Hallucinations , Inference

2024

Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate. Hallucinations are concerning because they erode reliability and raise safety issues. Pruning is a technique that reduces model size by removing redundant weights, enabling more efficient sparse inference. Pruned models yield downstream task performance comparable to the original, making them ideal alternatives when operating on a limited budget. However, the effect that pruning has upon hallucinations in abstractive summarization with LLMs has yet to be explored. In this paper, we provide an extensive empirical study across five summarization datasets, two state-of-the-art pruning methods, and five instruction-tuned LLMs. Surprisingly, we find that hallucinations are less prevalent from pruned LLMs than the original models. Our analysis suggests that pruned models tend to depend more on the source document for summary generation. This leads to a higher lexical overlap between the generated summary and the source document, which could be a reason for the reduction in hallucination risk.

Journal Article

Share this book

Add to My Shelf

Comparative Assessment of Supervisory Control Algorithms for a Plug-In Hybrid Electric Vehicle

by Aletras, Nikolaos , Samaras, Zissis , Ntziachristos, Leonidas in Algorithms , Automobiles , Carbon dioxide

2023

The study examines alternative on-board energy management system (EMS) supervisory control algorithms for plug-in hybrid electric vehicles. The optimum fuel consumption was sought between an equivalent consumption minimization strategy (ECMS) algorithm and a back-engineered commercial rule-based (RB) one, under different operating conditions. The RB algorithm was first validated with experimental data. A method to assess different algorithms under identical states of charge variations, vehicle distance travelled, and wheel power demand criteria is first demonstrated. Implementing this method to evaluate the two algorithms leads to fuel consumption corrections of up to 8%, compared to applying no correction. We argue that such a correction should always be used in relevant studies. Overall, results show that the ECMS algorithm leads to lower fuel consumption than the RB one in most driving conditions. The difference maximizes at low average speeds (<40 km/h), where the RB leads to more frequent low load engine operation. The two algorithms lead to fuel consumption differences of 3.4% over the WLTC, while the maximum difference of 24.2% was observed for a driving cycle with low average speed (18.4 km/h). Further to fuel consumption performance optimization, the ECMS algorithm also appears superior in terms of adaptability to different driving cycles.

Journal Article

Share this book

Add to My Shelf

Optimization-Based Energy Management Algorithm for 2-Stroke Hybrid Ship with Controllable Pitch Propeller

by Kefalas, Nikolaos , Aletras, Nikolaos , Samaras, Zissis in 2-stroke marine engines , Adaptive algorithms , Algorithms

2024

This paper examines the fuel consumption savings of a hybrid ship powertrain with 2-stroke main engine by implementing a novel adaptive equivalent consumption minimization strategy that utilizes a controllable pitch propeller. A non-hybrid powertrain model was developed as a demonstrator and real-world data were used for fuel consumption and efficiency maps. The baseline powertrain model was extended to a hybrid by introducing a shaft generator, a battery, a controllable pitch propeller, and the supervisory control algorithm. The potential benefits of the proposed powertrain are examined over different operation phases including port stay, open sea sailing, and port approach. The result showed that the energy efficiency gains can reach up to 6% under the open sea sailing phase. Furthermore, the controllable pitch propeller offers additional energy efficiency benefits of 2% under the port approach phase, utilizing the proposed algorithm. If the proposed powertrain is produced and the implemented algorithm is adopted, this could lead to substantial carbon dioxide emissions and fuel consumption savings at sea.

Journal Article

Share this book

Add to My Shelf

Identifying Twitter users who repost unreliable news sources with linguistic information

by Mu, Yida , Aletras, Nikolaos in Classification , Computational Linguistics , Digital media

2020

Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handles of unreliable news sources sometime in the future is of utmost importance for early detection and prevention of disinformation diffusion in a network, and has yet to be explored. To that end, we present a novel task for predicting whether a user will repost content from Twitter handles of unreliable news sources by leveraging linguistic information from the user’s own posts. We develop a new dataset of approximately 6.2K Twitter users mapped into two categories: (1) those that have reposted content from unreliable news sources; and (2) those that repost content only from reliable sources. For our task, we evaluate a battery of supervised machine learning models as well as state-of-the-art neural models, achieving up to 79.7 macro F1. In addition, our linguistic feature analysis uncovers differences in language use and style between the two user categories.

Journal Article

Share this book

Add to My Shelf

Applied Research in the Production of a Genre Film: Production Design and Realization of a School Movie

by Aletras, Nikolaos

2021

The production of a film for research purposes (film-based research) is a growing qualitative scientific research method, applied in many scientific fields. The researcher of the film-based research is an active part of the process, aiming at formulating questions and disputes rather than seeking answers. The object of the research is located in the cinematic production of a fiction film belonging to the genre of School Movies, in combination with the application of the auteur theory. The results of the present study highlight the importance of the personality of the researcher, who should balance between scientific principles and the needs and requirements of film production.

Journal Article

Share this book

Add to My Shelf

On the Impact of Calibration Data in Post-training Quantization and Pruning

by Williams, Miles , Aletras, Nikolaos in Calibration , Large language models , Neural networks

2024

Quantization and pruning form the foundation of compression for neural networks, enabling efficient inference for large language models (LLMs). Recently, various quantization and pruning techniques have demonstrated remarkable performance in a post-training setting. They rely upon calibration data, a small set of unlabeled examples that are used to generate layer activations. However, no prior work has systematically investigated how the calibration data impacts the effectiveness of model compression methods. In this paper, we present the first extensive empirical study on the effect of calibration data upon LLM performance. We trial a variety of quantization and pruning methods, datasets, tasks, and models. Surprisingly, we find substantial variations in downstream task performance, contrasting existing work that suggests a greater level of robustness to the calibration data. Finally, we make a series of recommendations for the effective use of calibration data in LLM quantization and pruning.

Paper

Share this book

Add to My Shelf

Vocabulary-level Memory Efficiency for Language Model Fine-tuning

by Williams, Miles , Aletras, Nikolaos in Embedding , Parameters

2025

The extensive memory footprint of language model (LM) fine-tuning poses a challenge for both researchers and practitioners. LMs use an embedding matrix to represent extensive vocabularies, forming a substantial proportion of the model parameters. While previous work towards memory-efficient fine-tuning has focused on minimizing the number of trainable parameters, reducing the memory footprint of the embedding matrix has yet to be explored. We first demonstrate that a significant proportion of the vocabulary remains unused during fine-tuning. We then propose a simple yet effective approach that leverages this finding to minimize memory usage. We show that our approach provides substantial reductions in memory usage across a wide range of models and tasks. Notably, our approach does not impact downstream task performance, while allowing more efficient use of computational resources.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter