Researcher profile

Amir Shaikhha

Amir Shaikhha contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2025arXiv

Hojabr: Towards a Theory of Everything for AI and Data Analytics

Modern data analytics pipelines increasingly combine relational queries, graph processing, and tensor computation within a single application, but existing systems remain fragmented across paradigms, execution models, and research communities. This fragmentation results in repeated optimization efforts, limited interoperability, and strict separation between logical abstractions and physical execution strategies. We propose Hojabr as a unified declarative intermediate language to address this problem. Hojabr integrates relational algebra, tensor algebra, and constraint-based reasoning within a single higher-order algebraic framework, in which joins, aggregations, tensor contractions, and recursive computations are expressed uniformly. Physical choices, such as join algorithms, execution models, and sparse versus dense tensor representations, are handled as constraint-specialization decisions rather than as separate formalisms. Hojabr supports bidirectional translation with existing declarative languages, enabling programs to be both lowered into Hojabr for analysis and optimization and lifted back into their original declarative form. By making semantic, structural, and algebraic properties explicit, and by supporting extensibility across the compilation stack, Hojabr enables systematic reasoning and reuse of optimization techniques across database systems, machine learning frameworks, and compiler infrastructures.

preprint2022arXiv

Functional Collection Programming with Semi-Ring Dictionaries

This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. SDQL can be competitive with or outperforms a host of systems that are state of the art in their own domain: in-memory DB systems Typer and Tectorwise for (flat, not nested) relational data; SciPy for LA workloads; sparse tensor compiler taco; the Trance nested relational engine; and the in-database machine learning engines LMFAO and Morpheus for hybrid DB/LA workloads over relational data.

preprint2022arXiv

Hinted Dictionaries: Efficient Functional Ordered Sets and Maps

This article introduces hinted dictionaries for expressing efficient ordered sets and maps functionally. As opposed to the traditional ordered dictionaries with logarithmic operations, hinted dictionaries can achieve better performance by using cursor-like objects referred to as hints. Hinted dictionaries unify the interfaces of imperative ordered dictionaries (e.g., C++ maps) and functional ones (e.g., Adams' sets). We show that such dictionaries can use sorted arrays, unbalanced trees, and balanced trees as their underlying representations. Throughout the article, we use Scala to present the different components of hinted dictionaries. We also provide a C++ implementation to evaluate the effectiveness of hinted dictionaries. Hinted dictionaries provide superior performance for set-set operations in comparison with the standard library of C++. Also, they show a competitive performance in comparison with the SciPy library for sparse vector operations.

preprint2021arXiv

BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation

Cardinality estimation (CardEst) is an essential component in query optimizers and a fundamental problem in DBMS. A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. However, no existing CardEst method can fulfill the three criteria at the same time. Traditional methods often have significant algorithm drawbacks such as large estimation errors. Recently proposed deep learning based methods largely improve the estimation accuracy but their performance can be greatly affected by data and often difficult for system deployment. In this paper, we revitalize the Bayesian networks (BN) for CardEst by incorporating the techniques of probabilistic programming languages. We present BayesCard, the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability, while overcomes their drawbacks, i.e. low structure learning and inference efficiency. This makes BayesCard a perfect candidate for commercial DBMS deployment. Our experimental results on several single-table and multi-table benchmarks indicate BayesCard's superiority over existing state-of-the-art CardEst methods: BayesCard achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates. Meanwhile, BayesCard keeps stable performance when varying data with different settings. We also deploy BayesCard into PostgreSQL. On the IMDB benchmark workload, it improves the end-to-end query time by 13.3%, which is very close to the optimal result of 14.2% using an oracle of true cardinality.

preprint2020arXiv

Multi-layer Optimizations for End-to-End Data Analytics

We consider the problem of training machine learning models over multi-relational data. The mainstream approach is to first construct the training dataset using a feature extraction query over input database and then use a statistical software package of choice to train the model. In this paper we introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language, which captures a subset of Python commonly used in Jupyter notebooks for rapid prototyping of machine learning applications. The program is subject to several layers of IFAQ optimizations, such as algebraic transformations, loop transformations, schema specialization, data layout optimizations, and finally compilation into efficient low-level C++ code specialized for the given workload and data. We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and TensorFlow by several orders of magnitude for linear regression and regression tree models over several relational datasets.