Source author record

Avraham Shinnar

Avraham Shinnar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Programming Languages Machine Learning Databases Artificial Intelligence cs.CY Logic in Computer Science math.PR

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Empirical Study of Modular Bias Mitigators and Ensembles

There are several bias mitigators that can reduce algorithmic bias in machine learning models but, unfortunately, the effect of mitigators on fairness is often not stable when measured across different data splits. A popular approach to train more stable models is ensemble learning. Ensembles, such as bagging, boosting, voting, or stacking, have been successful at making predictive performance more stable. One might therefore ask whether we can combine the advantages of bias mitigators and ensembles? To explore this question, we first need bias mitigators and ensembles to work together. We built an open-source library enabling the modular composition of 10 mitigators, 4 ensembles, and their corresponding hyperparameters. Based on this library, we empirically explored the space of combinations on 13 datasets, including datasets commonly used in fairness literature plus datasets newly curated by our library. Furthermore, we distilled the results into a guidance diagram for practitioners. We hope this paper will contribute towards improving stability in bias mitigation.

preprint2022arXiv

Formalization of a Stochastic Approximation Theorem

Stochastic approximation algorithms are iterative procedures which are used to approximate a target value in an environment where the target is unknown and direct observations are corrupted by noise. These algorithms are useful, for instance, for root-finding and function minimization when the target function or model is not directly known. Originally introduced in a 1951 paper by Robbins and Monro, the field of Stochastic approximation has grown enormously and has come to influence application domains from adaptive signal processing to artificial intelligence. As an example, the Stochastic Gradient Descent algorithm which is ubiquitous in various subdomains of Machine Learning is based on stochastic approximation theory. In this paper, we give a formal proof (in the Coq proof assistant) of a general convergence theorem due to Aryeh Dvoretzky, which implies the convergence of important classical methods such as the Robbins-Monro and the Kiefer-Wolfowitz algorithms. In the process, we build a comprehensive Coq library of measure-theoretic probability theory and stochastic processes.

preprint2022arXiv

Translating Canonical SQL to Imperative Code in Coq

SQL is by far the most widely used and implemented query language. Yet, on some key features, such as correlated queries and NULL value semantics, many implementations diverge or contain bugs. We leverage recent advances in the formalization of SQL and query compilers to develop DBCert, the first mechanically verified compiler from SQL queries written in a canonical form to imperative code. Building DBCert required several new contributions which are described in this paper. First, we specify and mechanize a complete translation from SQL to the Nested Relational Algebra which can be used for query optimization. Second, we define Imp, a small imperative language sufficient to express SQL and which can target several execution languages including JavaScript. Finally, we develop a mechanized translation from the nested relational algebra to Imp, using the nested relational calculus as an intermediate step.

preprint2020arXiv

Lale: Consistent Automated Machine Learning

Automated machine learning makes it easier for data scientists to develop pipelines by searching over possible choices for hyperparameters, algorithms, and even pipeline topologies. Unfortunately, the syntax for automated machine learning tools is inconsistent with manual machine learning, with each other, and with error checks. Furthermore, few tools support advanced features such as topology search or higher-order operators. This paper introduces Lale, a library of high-level Python interfaces that simplifies and unifies automated machine learning in a consistent way.

preprint2020arXiv

Type Safety with JSON Subschema

JSON is a popular data format used pervasively in web APIs, cloud computing, NoSQL databases, and increasingly also machine learning. JSON Schema is a language for declaring the structure of valid JSON data. There are validators that can decide whether a JSON document is valid with respect to a schema. Unfortunately, like all instance-based testing, these validators can only show the presence and never the absence of a bug. This paper presents a complementary technique: JSON subschema checking, which can be used for static type checking with JSON Schema. Deciding whether one schema is a subschema of another is non-trivial because of the richness of the JSON Schema specification language. Given a pair of schemas, our approach first canonicalizes and simplifies both schemas, then decides the subschema question on the canonical forms, dispatching simpler subschema queries to type-specific checkers. We apply an implementation of our subschema checking algorithm to 8,548 pairs of real-world JSON schemas from different domains, demonstrating that it can decide the subschema question for most schema pairs and is always correct for schema pairs that it can decide. We hope that our work will bring more static guarantees to hard-to-debug domains, such as cloud computing and artificial intelligence.

preprint2013arXiv

Semantics of (Resilient) X10

We present a formal small-step structural operational semantics for a large fragment of X10, unifying past work. The fragment covers multiple places, mutable objects on the heap, sequencing, \code{try/catch}, \code{async}, \code{finish}, and \code{at} constructs. This model accurately captures the behavior of a large class of concurrent, multi-place X10 programs. Further, we introduce a formal model of resilience in X10. During execution of an X10 program, a place may fail for many reasons. Resilient X10 permits the program to continue executing, losing the data at the failed place, and most of the control state, and repairing the global control state in such a way that key semantic principles hold, the Invariant Happens Before Principle, and the Failure Masking Principle. These principles permit an X10 programmer to write clean code that continues to work in the presence of place failure. The given semantics have additionally been mechanized in Coq.

preprint2012arXiv

M3R: Increased performance for in-memory Hadoop jobs

Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged -- including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets -- while providing significantly better performance than the Hadoop engine on several workloads (e.g. 45x on some input sizes for sparse matrix vector multiply). M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their performance under the Hadoop engine.

Avraham Shinnar

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

An Empirical Study of Modular Bias Mitigators and Ensembles

Formalization of a Stochastic Approximation Theorem

Translating Canonical SQL to Imperative Code in Coq

Lale: Consistent Automated Machine Learning

Type Safety with JSON Subschema

Semantics of (Resilient) X10

M3R: Increased performance for in-memory Hadoop jobs