Researcher profile

Eli Upfal

Eli Upfal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Balanced allocation: considerations from large scale service environments

We study d-way balanced allocation, which assigns each incoming job to the lightest loaded among d randomly chosen servers. While prior work has extensively studied the performance of the basic scheme, there has been less published work on adapting this technique to many aspects of large-scale systems. Based on our experience in building and running planet-scale cloud applications, we extend the understanding of d-way balanced allocation along the following dimensions: (i) Bursts: Events such as breaking news can produce bursts of requests that may temporarily exceed the servicing capacity of the system. Thus, we explore what happens during a burst and how long it takes for the system to recover from such bursts. (ii) Priorities: Production systems need to handle jobs with a mix of priorities (e.g., user facing requests may be high priority while other requests may be low priority). We extend d-way balanced allocation to handle multiple priorities. (iii) Noise: Production systems are often typically distributed and thus d-way balanced allocation must work with stale or incorrect information. Thus we explore the impact of noisy information and their interactions with bursts and priorities. We explore the above using both extensive simulations and analytical arguments. Specifically we show, (i) using simulations, that d-way balanced allocation quickly recovers from bursts and can gracefully handle priorities and noise; and (ii) that analysis of the underlying generative models complements our simulations and provides insight into our simulation results.

preprint2022arXiv

How Inclusive Are Wikipedia's Hyperlinks in Articles Covering Polarizing Topics?

Wikipedia relies on an extensive review process to verify that the content of each individual page is unbiased and presents a neutral point of view. Less attention has been paid to possible biases in the hyperlink structure of Wikipedia, which has a significant influence on the user's exploration process when visiting more than one page. The evaluation of hyperlink bias is challenging because it depends on the global view rather than the text of individual pages. In this paper, we focus on the influence of the interconnect topology between articles describing complementary aspects of polarizing topics. We introduce a novel measure of exposure to diverse information to quantify users' exposure to different aspects of a topic throughout an entire surfing session, rather than just one click ahead. We apply this measure to six polarizing topics (e.g., gun control and gun right), and we identify cases in which the network topology significantly limits the exposure of users to diverse information on the topic, encouraging users to remain in a knowledge bubble. Our findings demonstrate the importance of evaluating Wikipedia's network structure in addition to the extensive review of individual articles.

preprint2021arXiv

RePBubLik: Reducing the Polarized Bubble Radius with Link Insertions

The topology of the hyperlink graph among pages expressing different opinions may influence the exposure of readers to diverse content. Structural bias may trap a reader in a polarized bubble with no access to other opinions. We model readers' behavior as random walks. A node is in a polarized bubble if the expected length of a random walk from it to a page of different opinion is large. The structural bias of a graph is the sum of the radii of highly-polarized bubbles. We study the problem of decreasing the structural bias through edge insertions. Healing all nodes with high polarized bubble radius is hard to approximate within a logarithmic factor, so we focus on finding the best $k$ edges to insert to maximally reduce the structural bias. We present RePBubLik, an algorithm that leverages a variant of the random walk closeness centrality to select the edges to insert. RePBubLik obtains, under mild conditions, a constant-factor approximation. It reduces the structural bias faster than existing edge-recommendation methods, including some designed to reduce the polarization of a graph.

preprint2010arXiv

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology.

preprint2010arXiv

MADMX: A Novel Strategy for Maximal Dense Motif Extraction

We develop, analyze and experiment with a new tool, called MADMX, which extracts frequent motifs, possibly including don't care characters, from biological sequences. We introduce density, a simple and flexible measure for bounding the number of don't cares in a motif, defined as the ratio of solid (i.e., different from don't care) characters to the total length of the motif. By extracting only maximal dense motifs, MADMX reduces the output size and improves performance, while enhancing the quality of the discoveries. The efficiency of our approach relies on a newly defined combining operation, dubbed fusion, which allows for the construction of maximal dense motifs in a bottom-up fashion, while avoiding the generation of nonmaximal ones. We provide experimental evidence of the efficiency and the quality of the motifs returned by MADMX