Algebraic Model Management: A Survey
We survey the field of model management and describe a new model management approach based on algebraic specification.
Discover
Workspaces
Network
Opportunities
Account
Researcher profile
Ryan Wisnesky contributes to research discovery and scholarly infrastructure.
Trust snapshot
Actions
Identity and collaboration
Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.
Log in to claimDirect collaboration
Claim this author entity first to unlock direct invitations.
Research graph
Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.
BZPEER is loading the nearby papers, people, topics and institutions for this page.
Published work
We survey the field of model management and describe a new model management approach based on algebraic specification.
We present a case study in applied category theory written from the point of view of an applied domain: the formalization of the widely-used property graphs data model in an enterprise setting using elementary constructions from type theory and category theory, including limit and co-limit sketches. Observing that algebraic data types are a common foundation of most of the enterprise schema languages we deal with in practice, for graph data or otherwise, we introduce a type theory for algebraic property graphs wherein the types denote both algebraic data types in the sense of functional programming and join-union E/R diagrams in the sense of database theory. We also provide theoretical foundations for graph transformation along schema mappings with by-construction guarantees of semantic consistency. Our data model originated as a formalization of a data integration toolkit developed at Uber which carries data and schemas along composable mappings between data interchange languages such as Apache Avro, Apache Thrift, and Protocol Buffers, and graph languages including RDF with OWL or SHACL-based schemas.
We show how computation of left Kan extensions can be reduced to computation of free models of cartesian (finite-limit) theories. We discuss how the standard and parallel chase compute weakly free models of regular theories and free models of cartesian theories, and compare the concept of "free model" with a similar concept from database theory known as "universal model". We prove that, as algorithms for computing finite free models of cartesian theories, the standard and parallel chase are complete under fairness assumptions. Finally, we describe an optimized implementation of the parallel chase specialized to left Kan extensions that achieves an order of magnitude improvement in our performance benchmarks compared to the next fastest left Kan extension algorithm we are aware of.
In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.