Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
316
result(s) for
"Codierung"
Sort by:
Large language models as a substitute for human experts in annotating political text
by
Heseltine, Michael
,
Clemm von Hohenberg, Bernhard
in
Humans
,
Language modeling
,
Large language models
2024
Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a ‘hybrid’ coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggest that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.
Journal Article
Survey on categorical data for neural networks
2020
This survey investigates current techniques for representing qualitative data for use as input to neural networks. Techniques for using qualitative data in neural networks are well known. However, researchers continue to discover new variations or entirely new methods for working with categorical data in neural networks. Our primary contribution is to cover these representation techniques in a single work. Practitioners working with big data often have a need to encode categorical values in their datasets in order to leverage machine learning algorithms. Moreover, the size of data sets we consider as big data may cause one to reject some encoding techniques as impractical, due to their running time complexity. Neural networks take vectors of real numbers as inputs. One must use a technique to map qualitative values to numerical values before using them as input to a neural network. These techniques are known as embeddings, encodings, representations, or distributed representations. Another contribution this work makes is to provide references for the source code of various techniques, where we are able to verify the authenticity of the source code. We cover recent research in several domains where researchers use categorical data in neural networks. Some of these domains are natural language processing, fraud detection, and clinical document automation. This study provides a starting point for research in determining which techniques for preparing qualitative data for use with neural networks are best. It is our intention that the reader should use these implementations as a starting point to design experiments to evaluate various techniques for working with qualitative data in neural networks. The third contribution we make in this work is a new perspective on techniques for using categorical data in neural networks. We organize techniques for using categorical data in neural networks into three categories. We find three distinct patterns in techniques that identify a technique as determined, algorithmic, or automated. The fourth contribution we make is to identify several opportunities for future research. The form of the data that one uses as an input to a neural network is crucial for using neural networks effectively. This work is a tool for researchers to find the most effective technique for working with categorical data in neural networks, in big data settings. To the best of our knowledge this is the first in-depth look at techniques for working with categorical data in neural networks.
Journal Article
Information Theory
2011,2012
Csiszár and Körner's book is widely regarded as a classic in the field of information theory, providing deep insights and expert treatment of the key theoretical issues. It includes in-depth coverage of the mathematics of reliable information transmission, both in two-terminal and multi-terminal network scenarios. Updated and considerably expanded, this new edition presents unique discussions of information theoretic secrecy and of zero-error information theory, including the deep connections of the latter with extremal combinatorics. The presentations of all core subjects are self contained, even the advanced topics, which helps readers to understand the important connections between seemingly different problems. Finally, 320 end-of-chapter problems, together with helpful solving hints, allow readers to develop a full command of the mathematical techniques. It is an ideal resource for graduate students and researchers in electrical and electronic engineering, computer science and applied mathematics.
Gifted microbes for genome mining and natural product discovery
2017
Abstract
Actinomycetes are historically important sources for secondary metabolites (SMs) with applications in human medicine, animal health, and plant crop protection. It is now clear that actinomycetes and other microorganisms with large genomes have the capacity to produce many more SMs than was anticipated from standard fermentation studies. Indeed ~90 % of SM gene clusters (SMGCs) predicted from genome sequencing are cryptic under conventional fermentation and analytical analyses. Previous studies have suggested that among the actinomycetes with large genomes, some have the coding capacity to produce many more SMs than others, and that strains with the largest genomes tend to be the most gifted. These contentions have been evaluated more quantitatively by antiSMASH 3.0 analyses of microbial genomes, and the results indicate that many actinomycetes with large genomes are gifted for SM production, encoding 20–50 SMGCs, and devoting 0.8–3.0 Mb of coding capacity to SM production. Several Proteobacteria and Firmacutes with large genomes encode 20–30 SMGCs and devote 0.8–1.3 Mb of DNA to SM production, whereas cultured bacteria and archaea with small genomes devote insignificant coding capacity to SM production. Fully sequenced genomes of uncultured bacteria and archaea have small genomes nearly devoid of SMGCs.
Journal Article
Optimisation of distributed manufacturing flexible job shop scheduling by using hybrid genetic algorithms
by
Liu, Tung-Kuan
,
Chang, Hao-Chin
in
Advanced manufacturing technologies
,
Algorithms
,
Benchmarks
2017
In contrast to traditional job-shop scheduling problems, various complex constraints must be considered in distributed manufacturing environments; therefore, developing a novel scheduling solution is necessary. This paper proposes a hybrid genetic algorithm (HGA) for solving the distributed and flexible job-shop scheduling problem (DFJSP). Compared with previous studies on HGAs, the HGA approach proposed in this study uses the Taguchi method to optimize the parameters of a genetic algorithm (GA). Furthermore, a novel encoding mechanism is proposed to solve invalid job assignments, where a GA is employed to solve complex flexible job-shop scheduling problems (FJSPs). In addition, various crossover and mutation operators are adopted for increasing the probability of finding the optimal solution and diversity of chromosomes and for refining a makespan solution. To evaluate the performance of the proposed approach, three classic DFJSP benchmarks and three virtual DFJSPs were adapted from classical FJSP benchmarks. The experimental results indicate that the proposed approach is considerably robust, outperforming previous algorithms after 50 runs.
Journal Article
On constrained spectral clustering and its applications
by
Qian, Buyue
,
Wang, Xiang
,
Davidson, Ian
in
Algorithms
,
Artificial Intelligence
,
Chemistry and Earth Sciences
2014
Constrained clustering has been well-studied for algorithms such as
K
-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constraints is to use spectral clustering, which remains a developing area. In this paper, we propose a flexible framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode Must-Link (ML) and Cannot-Link (CL) constraints by modifying the graph Laplacian or constraining the underlying eigenspace, we present a more natural and principled formulation, which explicitly encodes the constraints as part of a constrained optimization problem. Our method offers several practical advantages: it can encode the degree of belief in ML and CL constraints; it guarantees to lower-bound how well the given constraints are satisfied using a user-specified threshold; it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and encoding the constraints explicitly, much of the existing analysis of unconstrained spectral clustering techniques remains valid for our formulation. We validate the effectiveness of our approach by empirical results on both artificial and real datasets. We also demonstrate an innovative use of encoding large number of constraints: transfer learning via constraints.
Journal Article
Polarization-resolved and polarization- multiplexed spike encoding properties in photonic neuron based on VCSEL-SA
by
Hao, Yue
,
Guo, Xingxing
,
Zhang, Yahui
in
639/624/1020/1093
,
639/766/400/584
,
639/766/530/2803
2018
The spike encoding properties of two polarization-resolved modes in vertical-cavity surface-emitting laser with an embedded saturable absorber (VCSEL-SA) are investigated numerically, based on the spin-flip model combined with the Yamada model. The results show that the external input optical pulse (EIOP) can be encoded into spikes in X-polarization (XP) mode, Y-polarization (YP) mode, or both XP and YP modes. Furthermore, the numerical bifurcation diagrams show that a lower (higher) strength of EIOP is beneficial for generating tonic (phasic) spikes; a small amplitude anisotropy contributes to wide (narrow) tonic spiking range in XP (YP) mode; a large current leads to low thresholds of EIOP strength for both XP and YP modes. However, the spike encoding properties are hardly affected by the phase anisotropy. The encoding rate is shown to be improved by increasing EIOP strength. Moreover, dual-channel polarization-multiplexed spike encoding can also be achieved in a single VCSEL-SA. To the best of our knowledge, such single channel polarization-resolved and dual-channel polarization-multiplexed spike encoding schemes have not yet been reported. Hence, this work is valuable for ultrafast photonic neuromorphic systems and brain-inspired information processing.
Journal Article
Synaptic computation
by
Abbott, L. F.
,
Regehr, Wade G.
in
Animals
,
Biological and medical sciences
,
Cognition & reasoning
2004
Neurons are often considered to be the computational engines of the brain, with synapses acting solely as conveyers of information. But the diverse types of synaptic plasticity and the range of timescales over which they operate suggest that synapses have a more active role in information processing. Long-term changes in the transmission properties of synapses provide a physiological substrate for learning and memory, whereas short-term changes support a variety of computations. By expressing several forms of synaptic plasticity, a single neuron can convey an array of different signals to the neural circuit in which it operates.
Journal Article
Qualitative Content Analysis: From Kracauer's Beginnings to Today's Challenges
2019
Zu Beginn der 1950er Jahre, als die Kommunikationsforschung ihre Blütezeit erlebte, führte KRACAUER den Begriff \"qualitative content analysis\" ein. Heute gehört die qualitative Inhaltsanalyse in Deutschland zu den in der Sozialforschung am häufigsten benutzten Methoden. Anknüpfend an KRACAUERs Argumentation schlage ich drei Felder der Weiterentwicklung vor: erstens eine stärker qualitativ ausgerichtete Analyse nach der Bildung der Kategorien und der Codierung der Daten; zweitens eine die kategorienbasierte Analyse ergänzende Fallorientierung, die charakteristisch für qualitative Forschung ist, aber bisher in der qualitativen Inhaltsanalyse kaum eine Rolle spielt; drittens eine stärkere Bezugnahme auf die internationale Methodendiskussion, in der die qualitative Inhaltsanalyse noch wenig bekannt ist. Ferner reflektiere ich methodologische Aspekte, fokussiere in einem abschließenden Ausblick das Thema Standards und Gütekriterien und plädiere für die Entwicklung methodischer Strenge.
Journal Article