Catalogue Search | MBRL

by WISNESKY, RYAN , SCHULTZ, PATRICK in Big Data Collection

2017

In this paper, we develop an algebraic approach to data integration by combining techniques from functional programming, category theory, and database theory. In our formalism, database schemas and instances are algebraic (multi-sorted equational) theories of a certain form. Schemas denote categories, and instances denote their initial (term) algebras. The instances on a schema S form a category, S–Inst, and a morphism of schemas F : S → T induces three adjoint data migration functors: Σ F : S–Inst → T–Inst, defined by substitution along F, which has a right adjoint Δ F : T–Inst → S–Inst, which in turn has a right adjoint Π F : S–Inst → T–Inst. We present a query language based on for/where/return syntax where each query denotes a sequence of data migration functors; a pushout-based design pattern for performing data integration using our formalism; and describe the implementation of our formalism in a tool we call AQL (Algebraic Query Language).

Journal Article

Share this book

Add to My Shelf

Algebraic Model Management: A Survey

by Schultz, Patrick , Spivak, David I , Ryan Wisnesky in Algebra

2023

We survey the field of model management and describe a new model management approach based on algebraic specification.

Paper

Share this book

Add to My Shelf

Fast Left Kan Extensions Using the Chase

by Wisnesky, Ryan , Meyers, Joshua , Spivak, David I in Algorithms , Automation , Cartesian coordinates

2022

We show how computation of left Kan extensions can be reduced to computation of free models of cartesian (finite-limit) theories. We discuss how the standard and parallel chase compute weakly free models of regular theories and free models of cartesian theories and compare the concept of “free model” with a similar concept from database theory known as “universal model”. We prove that, as algorithms for computing finite-free models of cartesian theories, the standard and parallel chase are complete under fairness assumptions. Finally, we describe an optimized implementation of the parallel chase specialized to left Kan extensions that achieves an order of magnitude improvement in our performance benchmarks compared to the next fastest left Kan extension algorithm we are aware of.

Journal Article

Share this book

Add to My Shelf

Functional query languages with categorical types

by Wisnesky, Ryan in Computer science

2014

We study three category-theoretic types in the context of functional query languages (typed lambda-calculi extended with additional operations for bulk data processing). The types we study are: 1) The dependent identity type. By adding identity types to the simply-typed lambda-calculus we obtain a language where embedded dependencies are first-class objects that can be manipulated by the programmer and used for optimization. We prove that the chase re-writing procedure is sound for this language. 2) The type of propositions. By adding propositions to the simply-typed lambda-calculus, we obtain higher-order logic. We prove that every hereditarily domain-independent higher-order logic program can be translated into the nested relational algebra, thereby allowing higher-order logic to be used as a query language and giving a higher-order generalization of Codd's theorem. 3) The type of finitely presented categories. By adding types for finitely presented categories to the simply-typed lambda-calculus we obtain a schema mapping language for the functorial data model. We define FQL, the first query language for this data model, investigate its metatheory, and build a SQL compiler for FQL.

Dissertation

Share this book

Add to My Shelf

Towards A More Reasonable Semantic Web

by Doing, Vleer , Ryan Wisnesky in Spatial data

2024

We aim to accelerate the original vision of the semantic web by revisiting design decisions that have defined the semantic web up until now. We propose a shift in direction that more broadly embraces existing data infrastructure by reconsidering the semantic web's logical foundations. We argue to shift attention away from description logic, which has so far underpinned the semantic web, to a different fragment of first-order logic. We argue, using examples from the (geo)spatial domain, that by doing so, the semantic web can be approached as a traditional data migration and integration problem at a massive scale. That way, a huge amount of existing tools and theories can be deployed to the semantic web's benefit, and the original vision of ontology as shared abstraction be reinvigorated.

Paper

Share this book

Add to My Shelf

Relational to RDF Data Migration by Query Co-Evaluation

by Ryan Wisnesky , Filonik, Daniel in Algorithms , Columnar structure , Formal method

2024

In this paper we define a new algorithm to convert an input relational database to an output set of RDF triples. The algorithm can be used to e.g. load CSV data into a financial OWL ontology such as FIBO. The algorithm takes as input a set of relational conjunctive (select-from-where) queries, one for each input table, from the three column (subject, predicate, object) output RDF schema to the input table's relational schema. The algorithm's output is the only set of RDF triples for which a unique round-trip of the input data under the relational queries exists. The output may contain blank nodes, is unique up to unique isomorphism, and can be obtained using elementary formal methods (equational theorem proving and term model construction specifically). We also describe how (generalized) homomorphisms between graphs can be used to write such relational conjunctive (select-from-where) queries, which, due to the lack of structure in the three-column RDF schema, tend to be large in practice. We demonstrate examples of both the algorithm and mapping language on the FIBO financial ontology.

Paper

Share this book

Add to My Shelf

Informal Data Transformation Considered Harmful

by Daimler, Eric , Ryan Wisnesky in Algorithms , Cleaning , Data integrity

2020

In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.

Paper

Share this book

Add to My Shelf

Algebraic Property Graphs

by Meyers, Joshua G , Ryan Wisnesky , Shinavier, Joshua in Algebra , Data integration , Data models

2022

We present a case study in applied category theory written from the point of view of an applied domain: the formalization of the widely-used property graphs data model in an enterprise setting using elementary constructions from type theory and category theory, including limit and co-limit sketches. Observing that algebraic data types are a common foundation of most of the enterprise schema languages we deal with in practice, for graph data or otherwise, we introduce a type theory for algebraic property graphs wherein the types denote both algebraic data types in the sense of functional programming and join-union E/R diagrams in the sense of database theory. We also provide theoretical foundations for graph transformation along schema mappings with by-construction guarantees of semantic consistency. Our data model originated as a formalization of a data integration toolkit developed at Uber which carries data and schemas along composable mappings between data interchange languages such as Apache Avro, Apache Thrift, and Protocol Buffers, and graph languages including RDF with OWL or SHACL-based schemas.

Paper

Share this book

Add to My Shelf

Fast Left Kan Extensions Using The Chase

by Meyers, Joshua , Ryan Wisnesky , Spivak, David I in Algorithms , Cartesian coordinates , Computation

2022

We show how computation of left Kan extensions can be reduced to computation of free models of cartesian (finite-limit) theories. We discuss how the standard and parallel chase compute weakly free models of regular theories and free models of cartesian theories, and compare the concept of \"free model\" with a similar concept from database theory known as \"universal model\". We prove that, as algorithms for computing finite free models of cartesian theories, the standard and parallel chase are complete under fairness assumptions. Finally, we describe an optimized implementation of the parallel chase specialized to left Kan extensions that achieves an order of magnitude improvement in our performance benchmarks compared to the next fastest left Kan extension algorithm we are aware of.

Paper

Share this book

Add to My Shelf

Algebraic Data Integration

by Schultz, Patrick , Ryan Wisnesky in Algebra , Data integration , Formalism

2017

In this paper we develop an algebraic approach to data integration by combining techniques from functional programming, category theory, and database theory. In our formalism, database schemas and instances are algebraic (multi-sorted equational) theories of a certain form. Schemas denote categories, and instances denote their initial (term) algebras. The instances on a schema S form a category, S-Inst, and a morphism of schemas F : S -> T induces three adjoint data migration functors: Sigma_F : S-Inst -> T-Inst, defined by substitution along F, which has a right adjoint Delta_F : T-Inst -> S-Inst, which in turn has a right adjoint Pi_F : S-Inst -> T-Inst. We present a query language based on for/where/return syntax where each query denotes a sequence of data migration functors; a pushout-based design pattern for performing data integration using our formalism; and describe the implementation of our formalism in a tool we call AQL.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter