Researcher profile

Yael Amsterdamer

Yael Amsterdamer contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Multi-Document Keyphrase Extraction: Dataset, Baselines and Review

Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To stimulate this pursuit, we present here the first dataset for the task, MK-DUC-01, which can serve as a new benchmark, and test multiple keyphrase extraction baselines on our data. In addition, we provide a brief, yet comprehensive, literature review of the task.

preprint2022arXiv

Selecting Sub-tables for Data Exploration

We present a framework for creating small, informative sub-tables of large data tables to facilitate the first step of data science: data exploration. Given a large data table table T, the goal is to create a sub-table of small, fixed dimensions, by selecting a subset of T's rows and projecting them over a subset of T's columns. The question is: which rows and columns should be selected to yield an informative sub-table? We formalize the notion of "informativeness" based on two complementary metrics: cell coverage, which measures how well the sub-table captures prominent association rules in T, and diversity. Since computing optimal sub-tables using these metrics is shown to be infeasible, we give an efficient algorithm which indirectly accounts for association rules using table embedding. The resulting framework can be used for visualizing the complete sub-table, as well as for displaying the results of queries over the sub-table, enabling the user to quickly understand the results and determine subsequent queries. Experimental results show that we can efficiently compute high-quality sub-tables as measured by our metrics, as well as by feedback from user-studies.

preprint2022arXiv

Worst-case Analysis for Interactive Evaluation of Boolean Provenance

In recent work, we have introduced a framework for fine-grained consent management in databases, which combines Boolean data provenance with the field of interactive Boolean evaluation. In turn, interactive Boolean evaluation aims at unveiling the underlying truth value of a Boolean expression by frugally probing the truth values of individual values. The required number of probes depends on the Boolean provenance structure and on the (a-priori unknown) probe answers. Prior work has analyzed and aimed to optimize the expected number of probes, where expectancy is with respect to a probability distribution over probe answers. This paper gives a novel worst-case analysis for the problem, inspired by the decision tree depth of Boolean functions. Specifically, we introduce a notion of evasive provenance expressions, namely expressions, where one may need to probe all variables in the worst case. We show that read-once expressions are evasive, and identify an additional class of expressions (acyclic monotone 2-DNF) for which evasiveness may be decided in PTIME. As for the more general question of finding the optimal strategy, we show that it is coNP-hard in general. We are still able to identify a sub-class of provenance expressions that is "far from evasive", namely, where an optimal worst-case strategy probes only log(n) out of the n variables in the expression, and show that we can find this optimal strategy in polynomial time.

preprint2020arXiv

Evaluating Interactive Summarization: an Expansion-Based Framework

Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for expansion-based interactive summarization, which considers the accumulating information along an interactive session. Our framework includes a procedure of collecting real user sessions and evaluation measures relying on standards, but adapted to reflect interaction. All of our solutions are intended to be released publicly as a benchmark, allowing comparison of future developments in interactive summarization. We demonstrate the use of our framework by evaluating and comparing baseline implementations that we developed for this purpose, which will serve as part of our benchmark. Our extensive experimentation and analysis of these systems motivate our design choices and support the viability of our framework.