Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse

We study language generation in the limit under a global preference ordering on strings, as introduced by Kleinberg and Wei. As in [arXiv:2504.14370, arXiv:2511.05295], we aim for \emph{breadth}, but impose an additional requirement of timeliness: higher-ranked strings should be generated earlier. A string is then only credited if it is generated before a deadline, where its deadline is defined by a function that maps a string's rank in the target language to the time by which it must be produced. This is in keeping with a central consideration in machine learning, where inductive bias favors ``simpler'' or ``more plausible'' outputs, all else being equal. We show that timely generation is impossible in a strong sense for eventually consistent generators -- the protagonists of most prior related work. Under what is perhaps the mildest natural relaxation of consistency, a hallucination rate that vanishes over time, we show that we can circumvent our impossibility result. In particular, we can achieve optimal density with respect to any superlinear deadline function. We also show this is tight by ruling out timely generation with linear deadlines and vanishing hallucination rate.

preprint2012arXiv

Finding Endogenously Formed Communities

A central problem in e-commerce is determining overlapping communities among individuals or objects in the absence of external identification or tagging. We address this problem by introducing a framework that captures the notion of communities or clusters determined by the relative affinities among their members. To this end we define what we call an affinity system, which is a set of elements, each with a vector characterizing its preference for all other elements in the set. We define a natural notion of (potentially overlapping) communities in an affinity system, in which the members of a given community collectively prefer each other to anyone else outside the community. Thus these communities are endogenously formed in the affinity system and are "self-determined" or "self-certified" by its members. We provide a tight polynomial bound on the number of self-determined communities as a function of the robustness of the community. We present a polynomial-time algorithm for enumerating these communities. Moreover, we obtain a local algorithm with a strong stochastic performance guarantee that can find a community in time nearly linear in the of size the community. Social networks fit particularly naturally within the affinity system framework -- if we can appropriately extract the affinities from the relatively sparse yet rich information from social networks, our analysis then yields a set of efficient algorithms for enumerating self-determined communities in social networks. In the context of social networks we also connect our analysis with results about $(α,β)$-clusters introduced by Mishra, Schreiber, Stanton, and Tarjan \cite{msst}. In contrast with the polynomial bound we prove on the number of communities in the affinity system model, we show that there exists a family of networks with superpolynomial number of $(α,β)$-clusters.

preprint2011arXiv

Efficient Clustering with Limited Distance Information

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s in S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.

preprint2011arXiv

Metric uniformization and spectral bounds for graphs

We present a method for proving upper bounds on the eigenvalues of the graph Laplacian. A main step involves choosing an appropriate "Riemannian" metric to uniformize the geometry of the graph. In many interesting cases, the existence of such a metric is shown by examining the combinatorics of special types of flows. This involves proving new inequalities on the crossing number of graphs. In particular, we use our method to show that for any positive integer k, the kth smallest eigenvalue of the Laplacian on an n-vertex, bounded-degree planar graph is O(k/n). This bound is asymptotically tight for every k, as it is easily seen to be achieved for square planar grids. We also extend this spectral result to graphs with bounded genus, and graphs which forbid fixed minors. Previously, such spectral upper bounds were only known for the case k=2.

preprint2011arXiv

Non-Conservative Diffusion and its Application to Social Network Analysis

The random walk is fundamental to modeling dynamic processes on networks. Metrics based on the random walk have been used in many applications from image processing to Web page ranking. However, how appropriate are random walks to modeling and analyzing social networks? We argue that unlike a random walk, which conserves the quantity diffusing on a network, many interesting social phenomena, such as the spread of information or disease on a social network, are fundamentally non-conservative. When an individual infects her neighbor with a virus, the total amount of infection increases. We classify diffusion processes as conservative and non-conservative and show how these differences impact the choice of metrics used for network analysis, as well as our understanding of network structure and behavior. We show that Alpha-Centrality, which mathematically describes non-conservative diffusion, leads to new insights into the behavior of spreading processes on networks. We give a scalable approximate algorithm for computing the Alpha-Centrality in a massive graph. We validate our approach on real-world online social networks of Digg. We show that a non-conservative metric, such as Alpha-Centrality, produces better agreement with empirical measure of influence than conservative metrics, such as PageRank. We hope that our investigation will inspire further exploration into the realms of conservative and non-conservative metrics in social network analysis.

preprint2010arXiv

A Complexity View of Markets with Social Influence

In this paper, inspired by the work of Megiddo on the formation of preferences and strategic analysis, we consider an early market model studied in the field of economic theory, in which each trader's utility may be influenced by the bundles of goods obtained by her social neighbors. The goal of this paper is to understand and characterize the impact of social influence on the complexity of computing and approximating market equilibria. We present complexity-theoretic and algorithmic results for approximating market equilibria in this model with focus on two concrete influence models based on the traditional linear utility functions. Recall that an Arrow-Debreu market equilibrium in a conventional exchange market with linear utility functions can be computed in polynomial time by convex programming. Our complexity results show that even a bounded-degree, planar influence network can significantly increase the difficulty of equilibrium computation even in markets with only a constant number of goods. Our algorithmic results suggest that finding an approximate equilibrium in markets with hierarchical influence networks might be easier than that in markets with arbitrary neighborhood structures. By demonstrating a simple market with a constant number of goods and a bounded-degree, planar influence graph whose equilibrium is PPAD-hard to approximate, we also provide a counterexample to a common belief, which we refer to as the myth of a constant number of goods, that equilibria in markets with a constant number of goods are easy to compute or easy to approximate.

preprint2010arXiv

Electrical Flows, Laplacian Systems, and Faster Approximation of Maximum Flow in Undirected Graphs

We introduce a new approach to computing an approximately maximum s-t flow in a capacitated, undirected graph. This flow is computed by solving a sequence of electrical flow problems. Each electrical flow is given by the solution of a system of linear equations in a Laplacian matrix, and thus may be approximately computed in nearly-linear time. Using this approach, we develop the fastest known algorithm for computing approximately maximum s-t flows. For a graph having n vertices and m edges, our algorithm computes a (1-ε)-approximately maximum s-t flow in time \tilde{O}(mn^{1/3} ε^{-11/3}). A dual version of our approach computes a (1+ε)-approximately minimum s-t cut in time \tilde{O}(m+n^{4/3}\eps^{-8/3}), which is the fastest known algorithm for this problem as well. Previously, the best dependence on m and n was achieved by the algorithm of Goldberg and Rao (J. ACM 1998), which can be used to compute approximately maximum s-t flows in time \tilde{O}(m\sqrt{n}ε^{-1}), and approximately minimum s-t cuts in time \tilde{O}(m+n^{3/2}ε^{-3}).

preprint2010arXiv

Spectral Sparsification of Graphs

We introduce a new notion of graph sparsificaiton based on spectral similarity of graph Laplacians: spectral sparsification requires that the Laplacian quadratic form of the sparsifier approximate that of the original. This is equivalent to saying that the Laplacian of the sparsifier is a good preconditioner for the Laplacian of the original. We prove that every graph has a spectral sparsifier of nearly linear size. Moreover, we present an algorithm that produces spectral sparsifiers in time $\softO{m}$, where $m$ is the number of edges in the original graph. This construction is a key component of a nearly-linear time algorithm for solving linear equations in diagonally-dominant matrcies. Our sparsification algorithm makes use of a nearly-linear time algorithm for graph partitioning that satisfies a strong guarantee: if the partition it outputs is very unbalanced, then the larger part is contained in a subgraph of high conductance.

preprint2007arXiv

A bounded-degree network formation game

Motivated by applications in peer-to-peer and overlay networks we define and study the \emph{Bounded Degree Network Formation} (BDNF) game. In an $(n,k)$-BDNF game, we are given $n$ nodes, a bound $k$ on the out-degree of each node, and a weight $w_{vu}$ for each ordered pair $(v,u)$ representing the traffic rate from node $v$ to node $u$. Each node $v$ uses up to $k$ directed links to connect to other nodes with an objective to minimize its average distance, using weights $w_{vu}$, to all other destinations. We study the existence of pure Nash equilibria for $(n,k)$-BDNF games. We show that if the weights are arbitrary, then a pure Nash wiring may not exist. Furthermore, it is NP-hard to determine whether a pure Nash wiring exists for a given $(n,k)$-BDNF instance. A major focus of this paper is on uniform $(n,k)$-BDNF games, in which all weights are 1. We describe how to construct a pure Nash equilibrium wiring given any $n$ and $k$, and establish that in all pure Nash wirings the cost of individual nodes cannot differ by more than a factor of nearly 2, whereas the diameter cannot exceed $O(\sqrt{n \log_k n})$. We also analyze best-response walks on the configuration space defined by the uniform game, and show that starting from any initial configuration, strong connectivity is reached within $Θ(n^2)$ rounds. Convergence to a pure Nash equilibrium, however, is not guaranteed. We present simulation results that suggest that loop-free best-response walks always exist, but may not be polynomially bounded. We also study a special family of \emph{regular} wirings, the class of Abelian Cayley graphs, in which all nodes imitate the same wiring pattern, and show that if $n$ is sufficiently large no such regular wiring can be a pure Nash equilibrium.

preprint1999arXiv

Regression Depth and Center Points

We show that, for any set of n points in d dimensions, there exists a hyperplane with regression depth at least ceiling(n/(d+1)). as had been conjectured by Rousseeuw and Hubert. Dually, for any arrangement of n hyperplanes in d dimensions there exists a point that cannot escape to infinity without crossing at least ceiling(n/(d+1)) hyperplanes. We also apply our approach to related questions on the existence of partitions of the data into subsets such that a common plane has nonzero regression depth in each subset, and to the computational complexity of regression depth problems.