Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
234 result(s) for "Hutter, Marcus"
Sort by:
Simplicity and Complexity in Combinatorial Optimization
Many problems in physics and computer science can be framed in terms of combinatorial optimization. Due to this, it is interesting and important to study theoretical aspects of such optimization. Here, we study connections between Kolmogorov complexity, optima, and optimization. We argue that (1) optima and complexity are connected, with extrema being more likely to have low complexity (under certain circumstances); (2) optimization by sampling candidate solutions according to algorithmic probability may be an effective optimization method; and (3) coincidences in extrema to optimization problems are a priori more likely as compared to a purely random null model.
Modeling the Arrows of Time with Causal Multibaker Maps
Why do we remember the past, and plan the future? We introduce a toy model in which to investigate emergent time asymmetries: the causal multibaker maps. These are reversible discrete-time dynamical systems with configurable causal interactions. Imposing a suitable initial condition or “Past Hypothesis”, and then coarse-graining, yields a Pearlean locally causal structure. While it is more common to speculate that the other arrows of time arise from the thermodynamic arrow, our model instead takes the causal arrow as fundamental. From it, we obtain the thermodynamic and epistemic arrows of time. The epistemic arrow concerns records, which we define to be systems that encode the state of another system at another time, regardless of the latter system’s dynamics. Such records exist of the past, but not of the future. We close with informal discussions of the evolutionary and agential arrows of time, and their relevance to decision theory.
Chances and Risks of Artificial Intelligence—A Concept of Developing and Exploiting Machine Intelligence for Future Societies
Artificial Intelligence (AI): Boon or Bane for societies? AI technologies and solutions—as most revolutionary technologies have done in the past—offer negative implications on the one hand and considerable positive potential on the other. Avoiding the former and fostering the latter will require substantial investments in future societal concepts, research and development, and control of AI-based solutions in AI security while avoiding abuse. Preparation for the future role of AI in societies should strive towards the implementation of related methods and tools for risk management, models of complementary human–machine cooperation, strategies for the optimization of production and administration, and innovative concepts for the distribution of the economic value created. Two extreme possible “end states” of AI impact (if there is ever an end state) that are being discussed at present may manifest as (a) uncontrolled substitution by AI of major aspects of production, services, and administrative and decision-making processes, leading to unprecedented risks such as high unemployment, and devaluation and the underpayment of people in paid work, resulting in inequality in the distribution of wealth and employment, diminishing social peace, social cohesion, solidarity, security, etc., or, on the contrary, (b) the freeing of people from routine labor through increased automation in production, administration and services, and changing the constitution of politics and societies into constituencies with high ethical standards, personal self-determination, and the general dominance of humane principles, as opposed to pure materialism. Any mix of these two extremes could develop, and these combinations may vary among different societies and political systems.
A Philosophical Treatise of Universal Induction
Understanding inductive reasoning is a problem that has engaged mankind for thousands of years. This problem is relevant to a wide range of fields and is integral to the philosophy of science. It has been tackled by many great minds ranging from philosophers to scientists to mathematicians, and more recently computer scientists. In this article we argue the case for Solomonoff Induction, a formal inductive framework which combines algorithmic information theory with the Bayesian framework. Although it achieves excellent theoretical results and is based on solid philosophical foundations, the requisite technical knowledge necessary for understanding this framework has caused it to remain largely unknown and unappreciated in the wider scientific community. The main contribution of this article is to convey Solomonoff induction and its related concepts in a generally accessible form with the aim of bridging this current technical gap. In the process we examine the major historical contributions that have led to the formulation of Solomonoff Induction as well as criticisms of Solomonoff and induction in general. In particular we examine how Solomonoff induction addresses many issues that have plagued other inductive systems, such as the black ravens paradox and the confirmation problem, and compare this approach with other recent approaches.
A Complete Theory of Everything (Will Be Subjective)
Increasingly encompassing models have been suggested for our world. Theories range from generally accepted to increasingly speculative to apparently bogus. The progression of theories from ego- to geo- to helio-centric models to universe and multiverse theories and beyond was accompanied by a dramatic increase in the sizes of the postulated worlds, with humans being expelled from their center to ever more remote and random locations. Rather than leading to a true theory of everything, this trend faces a turning point after which the predictive power of such theories decreases (actually to zero). Incorporating the location and other capacities of the observer into such theories avoids this problem and allows to distinguish meaningful from predictively meaningless theories. This also leads to a truly complete theory of everything consisting of a (conventional objective) theory of everything plus a (novel subjective) observer process. The observer localization is neither based on the controversial anthropic principle, nor has it anything to do with the quantum-mechanical observation process. The suggested principle is extended to more practical (partial, approximate, probabilistic, parametric) world models (rather than theories of everything). Finally, I provide a justification of Ockham’s razor, and criticize the anthropic principle, the doomsday argument, the no free lunch theorem, and the falsifiability dogma.
Open Problems in Universal Induction & Intelligence
Specialized intelligent systems can be found everywhere: finger print, handwriting, speech, and face recognition, spam filtering, chess and other game programs, robots, et al. This decade the first presumably complete mathematical theory of artificial intelligence based on universal induction-prediction-decision-action has been proposed. This informationtheoretic approach solidifies the foundations of inductive inference and artificial intelligence. Getting the foundations right usually marks a significant progress and maturing of a field. The theory provides a gold standard and guidance for researchers working on intelligent algorithms. The roots of universal induction have been laid exactly half-a-century ago and the roots of universal intelligence exactly one decade ago. So it is timely to take stock of what has been achieved and what remains to be done. Since there are already good recent surveys, I describe the state-of-the-art only in passing and refer the reader to the literature. This article concentrates on the open problems in universal induction and its extension to universal intelligence.
Reward tampering problems and solutions in reinforcement learning
Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principles that prevent instrumental goals for two different types of reward tampering (reward function tampering and RF-input tampering). Combined, the design principles can prevent reward tampering from being an instrumental goal. The analysis benefits from causal influence diagrams to provide intuitive yet precise formalizations.
Feature Reinforcement Learning: Part II. Structured MDPs
The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.
Imitation learning is probably existentially safe
Concerns about extinction risk from AI vary among experts in the field. However, AI encompasses a very broad category of algorithms. Perhaps some algorithms would pose an extinction risk, and others would not. Such an observation might be of great interest to both regulators and innovators. This paper argues that advanced imitation learners would likely not cause human extinction. We first present a simple argument to that effect, and then we rebut six different arguments that have been made to the contrary. A common theme of most of these arguments is a story for how a subroutine within an advanced imitation learner could hijack the imitation learner's behavior toward its own ends. However, we argue that each argument is flawed and each story implausible.
Advanced artificial agents intervene in the provision of reward
We analyze the expected behavior of an advanced artificial agent with a learned goal planning in an unknown environment. Given a few assumptions, we argue that it will encounter a fundamental ambiguity in the data about its goal. For example, if we provide a large reward to indicate that something about the world is satisfactory to us, it may hypothesize that what satisfied us was the sending of the reward itself; no observation can refute that. Then we argue that this ambiguity will lead it to intervene in whatever protocol we set up to provide data for the agent about its goal. We discuss an analogous failure mode of approximate solutions to assistance games. Finally, we briefly review some recent approaches that may avoid this problem.