Catalogue Search | MBRL

Large language models as a substitute for human experts in annotating political text

by Heseltine, Michael , Clemm von Hohenberg, Bernhard in Humans , Language modeling , Large language models

2024

Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a ‘hybrid’ coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggest that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.

Journal Article

Share this book

Add to My Shelf

Survey on categorical data for neural networks

by Hancock, John T. , Khoshgoftaar, Taghi M. in Algorithms , Automation , Big Data

2020

This survey investigates current techniques for representing qualitative data for use as input to neural networks. Techniques for using qualitative data in neural networks are well known. However, researchers continue to discover new variations or entirely new methods for working with categorical data in neural networks. Our primary contribution is to cover these representation techniques in a single work. Practitioners working with big data often have a need to encode categorical values in their datasets in order to leverage machine learning algorithms. Moreover, the size of data sets we consider as big data may cause one to reject some encoding techniques as impractical, due to their running time complexity. Neural networks take vectors of real numbers as inputs. One must use a technique to map qualitative values to numerical values before using them as input to a neural network. These techniques are known as embeddings, encodings, representations, or distributed representations. Another contribution this work makes is to provide references for the source code of various techniques, where we are able to verify the authenticity of the source code. We cover recent research in several domains where researchers use categorical data in neural networks. Some of these domains are natural language processing, fraud detection, and clinical document automation. This study provides a starting point for research in determining which techniques for preparing qualitative data for use with neural networks are best. It is our intention that the reader should use these implementations as a starting point to design experiments to evaluate various techniques for working with qualitative data in neural networks. The third contribution we make in this work is a new perspective on techniques for using categorical data in neural networks. We organize techniques for using categorical data in neural networks into three categories. We find three distinct patterns in techniques that identify a technique as determined, algorithmic, or automated. The fourth contribution we make is to identify several opportunities for future research. The form of the data that one uses as an input to a neural network is crucial for using neural networks effectively. This work is a tool for researchers to find the most effective technique for working with categorical data in neural networks, in big data settings. To the best of our knowledge this is the first in-depth look at techniques for working with categorical data in neural networks.

Journal Article

Share this book

Add to My Shelf

Information Theory

by Körner, János , Csiszár, Imre in Coding theory , Information theory , Textbooks

2011,2012

Csiszár and Körner's book is widely regarded as a classic in the field of information theory, providing deep insights and expert treatment of the key theoretical issues. It includes in-depth coverage of the mathematics of reliable information transmission, both in two-terminal and multi-terminal network scenarios. Updated and considerably expanded, this new edition presents unique discussions of information theoretic secrecy and of zero-error information theory, including the deep connections of the latter with extremal combinatorics. The presentations of all core subjects are self contained, even the advanced topics, which helps readers to understand the important connections between seemingly different problems. Finally, 320 end-of-chapter problems, together with helpful solving hints, allow readers to develop a full command of the mathematical techniques. It is an ideal resource for graduate students and researchers in electrical and electronic engineering, computer science and applied mathematics.

eBook

Share this book

Add to My Shelf

Gifted microbes for genome mining and natural product discovery

by Baltz, Richard H in Actinobacteria , Actinobacteria - genetics , Alphaproteobacteria - genetics

2017

Abstract Actinomycetes are historically important sources for secondary metabolites (SMs) with applications in human medicine, animal health, and plant crop protection. It is now clear that actinomycetes and other microorganisms with large genomes have the capacity to produce many more SMs than was anticipated from standard fermentation studies. Indeed ~90 % of SM gene clusters (SMGCs) predicted from genome sequencing are cryptic under conventional fermentation and analytical analyses. Previous studies have suggested that among the actinomycetes with large genomes, some have the coding capacity to produce many more SMs than others, and that strains with the largest genomes tend to be the most gifted. These contentions have been evaluated more quantitatively by antiSMASH 3.0 analyses of microbial genomes, and the results indicate that many actinomycetes with large genomes are gifted for SM production, encoding 20–50 SMGCs, and devoting 0.8–3.0 Mb of coding capacity to SM production. Several Proteobacteria and Firmacutes with large genomes encode 20–30 SMGCs and devote 0.8–1.3 Mb of DNA to SM production, whereas cultured bacteria and archaea with small genomes devote insignificant coding capacity to SM production. Fully sequenced genomes of uncultured bacteria and archaea have small genomes nearly devoid of SMGCs.

Journal Article

Share this book

Add to My Shelf

Einführung in multivariate Analyseverfahren: Materialien zum ZHSF-Herbstseminar

by Kohler, Ulrich in Codierung

2001

eBook

Share this book

Add to My Shelf

Optimisation of distributed manufacturing flexible job shop scheduling by using hybrid genetic algorithms

by Liu, Tung-Kuan , Chang, Hao-Chin in Advanced manufacturing technologies , Algorithms , Benchmarks

2017

In contrast to traditional job-shop scheduling problems, various complex constraints must be considered in distributed manufacturing environments; therefore, developing a novel scheduling solution is necessary. This paper proposes a hybrid genetic algorithm (HGA) for solving the distributed and flexible job-shop scheduling problem (DFJSP). Compared with previous studies on HGAs, the HGA approach proposed in this study uses the Taguchi method to optimize the parameters of a genetic algorithm (GA). Furthermore, a novel encoding mechanism is proposed to solve invalid job assignments, where a GA is employed to solve complex flexible job-shop scheduling problems (FJSPs). In addition, various crossover and mutation operators are adopted for increasing the probability of finding the optimal solution and diversity of chromosomes and for refining a makespan solution. To evaluate the performance of the proposed approach, three classic DFJSP benchmarks and three virtual DFJSPs were adapted from classical FJSP benchmarks. The experimental results indicate that the proposed approach is considerably robust, outperforming previous algorithms after 50 runs.

Journal Article

Share this book

Add to My Shelf

On constrained spectral clustering and its applications

by Qian, Buyue , Wang, Xiang , Davidson, Ian in Algorithms , Artificial Intelligence , Chemistry and Earth Sciences

2014

Constrained clustering has been well-studied for algorithms such as K -means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constraints is to use spectral clustering, which remains a developing area. In this paper, we propose a flexible framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode Must-Link (ML) and Cannot-Link (CL) constraints by modifying the graph Laplacian or constraining the underlying eigenspace, we present a more natural and principled formulation, which explicitly encodes the constraints as part of a constrained optimization problem. Our method offers several practical advantages: it can encode the degree of belief in ML and CL constraints; it guarantees to lower-bound how well the given constraints are satisfied using a user-specified threshold; it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and encoding the constraints explicitly, much of the existing analysis of unconstrained spectral clustering techniques remains valid for our formulation. We validate the effectiveness of our approach by empirical results on both artificial and real datasets. We also demonstrate an innovative use of encoding large number of constraints: transfer learning via constraints.

Journal Article

Share this book

Add to My Shelf

Polarization-resolved and polarization- multiplexed spike encoding properties in photonic neuron based on VCSEL-SA

by Hao, Yue , Guo, Xingxing , Zhang, Yahui in 639/624/1020/1093 , 639/766/400/584 , 639/766/530/2803

2018

The spike encoding properties of two polarization-resolved modes in vertical-cavity surface-emitting laser with an embedded saturable absorber (VCSEL-SA) are investigated numerically, based on the spin-flip model combined with the Yamada model. The results show that the external input optical pulse (EIOP) can be encoded into spikes in X-polarization (XP) mode, Y-polarization (YP) mode, or both XP and YP modes. Furthermore, the numerical bifurcation diagrams show that a lower (higher) strength of EIOP is beneficial for generating tonic (phasic) spikes; a small amplitude anisotropy contributes to wide (narrow) tonic spiking range in XP (YP) mode; a large current leads to low thresholds of EIOP strength for both XP and YP modes. However, the spike encoding properties are hardly affected by the phase anisotropy. The encoding rate is shown to be improved by increasing EIOP strength. Moreover, dual-channel polarization-multiplexed spike encoding can also be achieved in a single VCSEL-SA. To the best of our knowledge, such single channel polarization-resolved and dual-channel polarization-multiplexed spike encoding schemes have not yet been reported. Hence, this work is valuable for ultrafast photonic neuromorphic systems and brain-inspired information processing.

Journal Article

Share this book

Add to My Shelf

Synaptic computation

by Abbott, L. F. , Regehr, Wade G. in Animals , Biological and medical sciences , Cognition & reasoning

2004

Neurons are often considered to be the computational engines of the brain, with synapses acting solely as conveyers of information. But the diverse types of synaptic plasticity and the range of timescales over which they operate suggest that synapses have a more active role in information processing. Long-term changes in the transmission properties of synapses provide a physiological substrate for learning and memory, whereas short-term changes support a variety of computations. By expressing several forms of synaptic plasticity, a single neuron can convey an array of different signals to the neural circuit in which it operates.

Journal Article

Share this book

Add to My Shelf

Qualitative Content Analysis: From Kracauer's Beginnings to Today's Challenges

by Kuckartz, Udo in Argumentation , case analysis , category formation

2019

Zu Beginn der 1950er Jahre, als die Kommunikationsforschung ihre Blütezeit erlebte, führte KRACAUER den Begriff \"qualitative content analysis\" ein. Heute gehört die qualitative Inhaltsanalyse in Deutschland zu den in der Sozialforschung am häufigsten benutzten Methoden. Anknüpfend an KRACAUERs Argumentation schlage ich drei Felder der Weiterentwicklung vor: erstens eine stärker qualitativ ausgerichtete Analyse nach der Bildung der Kategorien und der Codierung der Daten; zweitens eine die kategorienbasierte Analyse ergänzende Fallorientierung, die charakteristisch für qualitative Forschung ist, aber bisher in der qualitativen Inhaltsanalyse kaum eine Rolle spielt; drittens eine stärkere Bezugnahme auf die internationale Methodendiskussion, in der die qualitative Inhaltsanalyse noch wenig bekannt ist. Ferner reflektiere ich methodologische Aspekte, fokussiere in einem abschließenden Ausblick das Thema Standards und Gütekriterien und plädiere für die Entwicklung methodischer Strenge.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter