Researcher profile

Benny Kimelfeld

Benny Kimelfeld contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Database Views as Explanations for Relational Deep Learning

In recent years, there has been significant progress in the development of deep learning models over relational databases, including architectures based on heterogeneous graph neural networks (hetero-GNNs) and heterogeneous graph transformers. In effect, such architectures state how the database records and links (e.g., foreign-key references) translate into a large, complex numerical expression, involving numerous learnable parameters. This complexity makes it hard to explain, in human-understandable terms, how a model uses the available data to arrive at a given prediction. We present a novel framework for explaining machine-learning models over relational databases, where explanations are view definitions that highlight focused parts of the database that mostly contribute to the model's prediction. We establish such global abductive explanations by adapting the classic notion of determinacy by Nash, Segoufin, and Vianu (2010). In addition to tuning the tradeoff between determinacy and conciseness, the framework allows controlling the level of granularity by adopting different fragments of view definitions, such as ones highlighting whole columns, foreign keys between tables, relevant groups of tuples, and so on. We investigate the realization of the framework in the case of hetero-GNNs, and develop a model-specific approach via the notion of learnable masks. For comparison, we propose model-agnostic heuristic baselines and show that our approach is both more efficient and achieves better explanation quality in most cases. Our extensive empirical evaluation on the RelBench collection across diverse domains and record-level tasks demonstrates both the usefulness of our explanations and the efficiency of their generation.

preprint2026arXiv

The Importance of Parameters in Ranking Functions

How important is the weight of a given column in determining the ranking of tuples in a table? To address such an explanation question about a ranking function, we investigate the computation of SHAP scores for column weights, adopting a recent framework by Grohe et al.[ICDT'24]. The exact definition of this score depends on three key components: (1) the ranking function in use, (2) an effect function that quantifies the impact of using alternative weights on the ranking, and (3) an underlying weight distribution. We analyze the computational complexity of different instantiations of this framework for a range of fundamental ranking and effect functions, focusing on probabilistically independent finite distributions for individual columns. For the ranking functions, we examine lexicographic orders and score-based orders defined by the summation, minimum, and maximum functions. For the effect functions, we consider global, top-k, and local perspectives: global measures quantify the divergence between the perturbed and original rankings, top-k measures inspect the change in the set of top-k answers, and local measures capture the impact on an individual tuple of interest. Although all cases admit an additive fully polynomial-time randomized approximation scheme (FPRAS), we establish the complexity of exact computation, identifying which cases are solvable in polynomial time and which are #P-hard. We further show that all complexity results, lower bounds and upper bounds, extend to a related task of computing the Shapley value of whole columns (regardless of their weight).

preprint2022arXiv

Computing the Shapley Value of Facts in Query Answering

The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query evaluation over relational databases fits well in this explanation paradigm. Yet, these works fall short of providing practical solutions to the computational challenge inherent to the Shapley computation. We present in this paper two practically effective solutions for computing Shapley values in query answering. We start by establishing a tight theoretical connection to the extensively studied problem of query evaluation over probabilistic databases, which allows us to obtain a polynomial-time algorithm for the class of queries for which probability computation is tractable. We then propose a first practical solution for computing Shapley values that adopts tools from probabilistic query evaluation. In particular, we capture the dependence of query answers on input database facts using Boolean expressions (data provenance), and then transform it, via Knowledge Compilation, into a particular circuit form for which we devise an algorithm for computing the Shapley values. Our second practical solution is a faster yet inexact approach that transforms the provenance to a Conjunctive Normal Form and uses a heuristic to compute the Shapley values. Our experiments on TPC-H and IMDB demonstrate the practical effectiveness of our solutions.

preprint2021arXiv

Computing the Extremal Possible Ranks with Incomplete Preferences

Various voting rules are based on ranking the candidates by scores induced by aggregating voter preferences. A winner (respectively, unique winner) is a candidate who receives a score not smaller than (respectively, strictly greater than) the remaining candidates. Examples of such rules include the positional scoring rules and the Bucklin, Copeland, and Maximin rules. When voter preferences are known in an incomplete manner as partial orders, a candidate can be a possible/necessary winner based on the possibilities of completing the partial votes. Past research has studied in depth the computational problems of determining the possible and necessary winners and unique winners. These problems are all special cases of reasoning about the range of possible positions of a candidate under different tiebreakers. We investigate the complexity of determining this range, and particularly the extremal positions. Among our results, we establish that finding each of the minimal and maximal positions is NP-hard for each of the above rules, including all positional scoring rules, pure or not. Hence, none of the tractable variants of necessary/possible winner determination remain tractable for extremal position determination. Tractability can be retained when reasoning about the top-$k$ positions for a fixed $k$. Yet, exceptional is Maximin where it is tractable to decide whether the maximal rank is $k$ for $k=1$ (necessary winning) but it becomes intractable for all $k>1$.

preprint2021arXiv

Probabilistic Inference of Winners in Elections by Independent Random Voters

We investigate the problem of computing the probability of winning in an election where voter attendance is uncertain. More precisely, we study the setting where, in addition to a total ordering of the candidates, each voter is associated with a probability of attending the poll, and the attendances of different voters are probabilistically independent. We show that the probability of winning can be computed in polynomial time for the plurality and veto rules. However, it is computationally hard (#P-hard) for various other rules, including $k$-approval and $k$-veto for $k>1$, Borda, Condorcet, and Maximin. For some of these rules, it is even hard to find a multiplicative approximation since it is already hard to determine whether this probability is nonzero. In contrast, we devise a fully polynomial-time randomized approximation scheme (FPRAS) for the complement probability, namely the probability of losing, for every positional scoring rule (with polynomial scores), as well as for the Condorcet rule.

preprint2020arXiv

Algorithmic Techniques for Necessary and Possible Winners

We investigate the practical aspects of computing the necessary and possible winners in elections over incomplete voter preferences. In the case of the necessary winners, we show how to implement and accelerate the polynomial-time algorithm of Xia and Conitzer. In the case of the possible winners, where the problem is NP-hard, we give a natural reduction to Integer Linear Programming (ILP) for all positional scoring rules and implement it in a leading commercial optimization solver. Further, we devise optimization techniques to minimize the number of ILP executions and, oftentimes, avoid them altogether. We conduct a thorough experimental study that includes the construction of a rich benchmark of election data based on real and synthetic data. Our findings suggest that, the worst-case intractability of the possible winners notwithstanding, the algorithmic techniques presented here scale well and can be used to compute the possible winners in realistic scenarios.

preprint2020arXiv

Approximate Denial Constraints

The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining approximate DCs (i.e., DCs that are "almost" satisfied) from data. Considering approximate constraints allows us to discover more accurate constraints in inconsistent databases, detect rules that are generally correct but may have a few exceptions, as well as avoid overfitting and obtain more general and less contrived constraints. We introduce the algorithm ADCMiner for mining approximate DCs. An important feature of this algorithm is that it does not assume any specific definition of an approximate DC, but takes the semantics as input. Since there is more than one way to define an approximate DC and different definitions may produce very different results, we do not focus on one definition, but rather on a general family of approximation functions that satisfies some natural axioms defined in this paper and captures commonly used definitions of approximate constraints. We also show how our algorithm can be combined with sampling to return results with high accuracy while significantly reducing the running time.

preprint2020arXiv

Geosocial Location Classification: Associating Type to Places Based on Geotagged Social-Media Posts

Associating type to locations can be used to enrich maps and can serve a plethora of geospatial applications. An automatic method to do so could make the process less expensive in terms of human labor, and faster to react to changes. In this paper we study the problem of Geosocial Location Classification, where the type of a site, e.g., a building, is discovered based on social-media posts. Our goal is to correctly associate a set of messages posted in a small radius around a given location with the corresponding location type, e.g., school, church, restaurant or museum. We explore two approaches to the problem: (a) a pipeline approach, where each message is first classified, and then the location associated with the message set is inferred from the individual message labels; and (b) a joint approach where the individual messages are simultaneously processed to yield the desired location type. We tested the two approaches over a dataset of geotagged tweets. Our results demonstrate the superiority of the joint approach. Moreover, we show that due to the unique structure of the problem, where weakly-related messages are jointly processed to yield a single final label, linear classifiers outperform deep neural network alternatives.

preprint2020arXiv

Supporting Hard Queries over Probabilistic Preferences

Preference analysis is widely applied in various domains such as social choice and e-commerce. A recently proposed framework augments the relational database with a preference relation that represents uncertain preferences in the form of statistical ranking models, and provides methods to evaluate Conjunctive Queries (CQs) that express preferences among item attributes. In this paper, we explore the evaluation of queries that are more general and harder to compute. The main focus of this paper is on a class of CQs that cannot be evaluated by previous work. These queries are provably hard since relate variables that represent items being compared. To overcome this hardness, we instantiate these variables with their domain values, rewrite hard CQs as unions of such instantiated queries, and develop several exact and approximate solvers to evaluate these unions of queries. We demonstrate that exact solvers that target specific common kinds of queries are far more efficient than general solvers. Further, we demonstrate that sophisticated approximate solvers making use of importance sampling can be orders of magnitude more efficient than exact solvers, while showing good accuracy. In addition to supporting provably hard CQs, we also present methods to evaluate an important family of count queries, and of top-k queries.

preprint2020arXiv

The Complexity of Determining the Necessary and Possible Top-k Winners in Partial Voting Profiles

When voter preferences are known in an incomplete (partial) manner, winner determination is commonly treated as the identification of the necessary and possible winners; these are the candidates who win in all completions or at least one completion, respectively, of the partial voting profile. In the case of a positional scoring rule, the winners are the candidates who receive the maximal total score from the voters. Yet, the outcome of an election might go beyond the absolute winners to the top-$k$ winners, as in the case of committee selection, primaries of political parties, and ranking in recruiting. We investigate the computational complexity of determining the necessary and possible top-$k$ winners over partial voting profiles. Our results apply to general classes of positional scoring rules and focus on the cases where $k$ is given as part of the input and where $k$ is fixed.

preprint2020arXiv

ViS-Á-ViS : Detecting Similar Patterns in Annotated Literary Text

We present a web-based system called ViS-Á-ViS aiming to assist literary scholars in detecting repetitive patterns in an annotated textual corpus. Pattern detection is made possible using distant reading visualizations that highlight potentially interesting patterns. In addition, the system uses time-series alignment algorithms, and in particular, dynamic time warping (DTW), to detect patterns automatically. We present a case-study where an ancient Hebrew poetry corpus was manually annotated with figurative language devices as metaphors and similes and then loaded into the system. Preliminary results confirm the effectiveness of the system in analyzing the annotated data and in detecting literary patterns and similarities.

preprint2019arXiv

The Impact of Negation on the Complexity of the Shapley Value in Conjunctive Queries

The Shapley value is a conventional and well-studied function for determining the contribution of a player to the coalition in a cooperative game. Among its applications in a plethora of domains, it has recently been proposed to use the Shapley value for quantifying the contribution of a tuple to the result of a database query. In particular, we have a thorough understanding of the tractability frontier for the class of Conjunctive Queries (CQs) and aggregate functions over CQs. It has also been established that a tractable (randomized) multiplicative approximation exists for every union of CQs. Nevertheless, all of these results are based on the monotonicity of CQs. In this work, we investigate the implication of negation on the complexity of Shapley computation, in both the exact and approximate senses. We generalize a known dichotomy to account for negated atoms. We also show that negation fundamentally changes the complexity of approximation. We do so by drawing a connection to the problem of deciding whether a tuple is "relevant" to a query, and by analyzing its complexity.