Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

InfoFlow: A Framework for Multi-Layer Transformer Analysis

While the approximation properties of single-layer Transformer architectures have been studied in recent works, a rigorous theoretical understanding of the multi-layer setting remains limited. In this work, we establish that multi-layer Transformers possess fundamentally different approximation capabilities from single-layer ones: for certain retrieval tasks, any single-layer Transformer requires least $Ω(\varepsilon^{-k})$ parameters to achieve precision $\varepsilon$, where $k$ grows linearly with sequence length $T$, whereas a two-layer Transformer with a single head per layer achieves the same approximation precision with at most $O (\varepsilon^{-1})$ parameters. To understand this separation, we identify two structural mechanisms underlying multi-layer approximation. Specifically, softmax attention can only efficiently retrieve the token attaining the maximum attention score, incurring exponential-in-length parameter cost for $k$-th largest retrieval with $k \geq 2$. Moreover, the parameter cost of decoding coupled information scales with the size of the retrieved token set. Motivated by these findings, we propose InfoFlow, a framework for multi-layer Transformers. The framework tracks an information set of accessible input positions at each token and layer, assigning an explicit approximation rate to each mode of information propagation. This abstraction recovers known approximation bounds, remains consistent with experimental observations on trained networks, and yields concrete predictions in settings where direct theoretical analysis is currently intractable. Our results provide a principled framework for reasoning about the approximation efficiency of multi-layer Transformers.

preprint2022arXiv

Cross-Technology Communication for the Internet of Things: A Survey

The ever-developing Internet of Things (IoT) brings the prosperity of wireless sensing and control applications. In many scenarios, different wireless technologies coexist in the shared frequency medium as well as the physical space. Such wireless coexistence may lead to serious cross-technology interference (CTI) problems, e.g. channel competition, signal collision, throughput degradation. Compared with traditional methods like interference avoidance, tolerance, and concurrency mechanism, direct and timely information exchange among heterogeneous devices is therefore a fundamental requirement to ensure the usability, inter-operability, and reliability of the IoT. Under this circumstance, Cross-Technology Communication (CTC) technique thus becomes a hot topic in both academic and industrial fields, which aims at directly exchanging data among heterogeneous devices that follow different standards. This paper comprehensively summarizes the CTC techniques and reveals that the key challenge for CTC lies in the heterogeneity of IoT devices, including the incompatibility of technical standards and the asymmetry of connection capability. Based on the above finding, we present a taxonomy of the existing CTC works (packet-level CTCs and physical-level CTCs) and compare the existing CTC techniques in terms of throughput, reliability, hardware modification, and concurrency.

preprint2022arXiv

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{θ\in R^d}\ \sum_{i=1}^{n}f_{i}(θ), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $θ$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $ε$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /ε))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/ε))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.

preprint2022arXiv

Improved Lower Bounds for Submodular Function Minimization

We provide a generic technique for constructing families of submodular functions to obtain lower bounds for submodular function minimization (SFM). Applying this technique, we prove that any deterministic SFM algorithm on a ground set of $n$ elements requires at least $Ω(n \log n)$ queries to an evaluation oracle. This is the first super-linear query complexity lower bound for SFM and improves upon the previous best lower bound of $2n$ given by [Graur et al., ITCS 2020]. Using our construction, we also prove that any (possibly randomized) parallel SFM algorithm, which can make up to $\mathsf{poly}(n)$ queries per round, requires at least $Ω(n / \log n)$ rounds to minimize a submodular function. This improves upon the previous best lower bound of $\tildeΩ(n^{1/3})$ rounds due to [Chakrabarty et al., FOCS 2021], and settles the parallel complexity of query-efficient SFM up to logarithmic factors due to a recent advance in [Jiang, SODA 2021].

preprint2022arXiv

Low-cost Relevance Generation and Evaluation Metrics for Entity Resolution in AI

Entity Resolution (ER) in voice assistants is a prime component during run time that resolves entities in users request to real world entities. ER involves two major functionalities 1. Relevance generation and 2. Ranking. In this paper we propose a low cost relevance generation framework by generating features using customer implicit and explicit feedback signals. The generated relevance datasets can serve as test sets to measure ER performance. We also introduce a set of metrics that accurately measures the performance of ER systems in various dimensions. They provide great interpretability to deep dive and identifying root cause of ER issues, whether the problem is in relevance generation or ranking.

preprint2022arXiv

Resolving Matrix Spencer Conjecture Up to Poly-logarithmic Rank

We give a simple proof of the matrix Spencer conjecture up to poly-logarithmic rank: given symmetric $d \times d$ matrices $A_1,\ldots,A_n$ each with $\|A_i\|_{\mathsf{op}} \leq 1$ and rank at most $n/\log^3 n$, one can efficiently find $\pm 1$ signs $x_1,\ldots,x_n$ such that their signed sum has spectral norm $\|\sum_{i=1}^n x_i A_i\|_{\mathsf{op}} = O(\sqrt{n})$. This result also implies a $\log n - Ω( \log \log n)$ qubit lower bound for quantum random access codes encoding $n$ classical bits with advantage $\gg 1/\sqrt{n}$. Our proof uses the recent refinement of the non-commutative Khintchine inequality in [Bandeira, Boedihardjo, van Handel, 2022] for random matrices with correlated Gaussian entries.

preprint2022arXiv

Smoothed Analysis of the Komlós Conjecture

The well-known Komlós conjecture states that given $n$ vectors in $\mathbb{R}^d$ with Euclidean norm at most one, there always exists a $\pm 1$ coloring such that the $\ell_{\infty}$ norm of the signed-sum vector is a constant independent of $n$ and $d$. We prove this conjecture in a smoothed analysis setting where the vectors are perturbed by adding a small Gaussian noise and when the number of vectors $n =ω(d\log d)$. The dependence of $n$ on $d$ is the best possible even in a completely random setting. Our proof relies on a weighted second moment method, where instead of considering uniformly randomly colorings we apply the second moment method on an implicit distribution on colorings obtained by applying the Gram-Schmidt walk algorithm to a suitable set of vectors. The main technical idea is to use various properties of these colorings, including subgaussianity, to control the second moment.

preprint2020arXiv

A Faster Interior Point Method for Semidefinite Programming

Semidefinite programs (SDPs) are a fundamental class of optimization problems with important recent applications in approximation algorithms, quantum complexity, robust learning, algorithmic rounding, and adversarial deep learning. This paper presents a faster interior point method to solve generic SDPs with variable size $n \times n$ and $m$ constraints in time \begin{align*} \widetilde{O}(\sqrt{n}( mn^2 + m^ω+ n^ω) \log(1 / ε) ), \end{align*} where $ω$ is the exponent of matrix multiplication and $ε$ is the relative accuracy. In the predominant case of $m \geq n$, our runtime outperforms that of the previous fastest SDP solver, which is based on the cutting plane method of Jiang, Lee, Song, and Wong [JLSW20]. Our algorithm's runtime can be naturally interpreted as follows: $\widetilde{O}(\sqrt{n} \log (1/ε))$ is the number of iterations needed for our interior point method, $mn^2$ is the input size, and $m^ω+ n^ω$ is the time to invert the Hessian and slack matrix in each iteration. These constitute natural barriers to further improving the runtime of interior point methods for solving generic SDPs.

preprint2020arXiv

An Improved Cutting Plane Method for Convex Optimization, Convex-Concave Games and its Applications

Given a separation oracle for a convex set $K \subset \mathbb{R}^n$ that is contained in a box of radius $R$, the goal is to either compute a point in $K$ or prove that $K$ does not contain a ball of radius $ε$. We propose a new cutting plane algorithm that uses an optimal $O(n \log (κ))$ evaluations of the oracle and an additional $O(n^2)$ time per evaluation, where $κ= nR/ε$. $\bullet$ This improves upon Vaidya&#39;s $O( \text{SO} \cdot n \log (κ) + n^{ω+1} \log (κ))$ time algorithm [Vaidya, FOCS 1989a] in terms of polynomial dependence on $n$, where $ω< 2.373$ is the exponent of matrix multiplication and $\text{SO}$ is the time for oracle evaluation. $\bullet$ This improves upon Lee-Sidford-Wong&#39;s $O( \text{SO} \cdot n \log (κ) + n^3 \log^{O(1)} (κ))$ time algorithm [Lee, Sidford and Wong, FOCS 2015] in terms of dependence on $κ$. For many important applications in economics, $κ= Ω(\exp(n))$ and this leads to a significant difference between $\log(κ)$ and $\mathrm{poly}(\log (κ))$. We also provide evidence that the $n^2$ time per evaluation cannot be improved and thus our running time is optimal. A bottleneck of previous cutting plane methods is to compute leverage scores, a measure of the relative importance of past constraints. Our result is achieved by a novel multi-layered data structure for leverage score maintenance, which is a sophisticated combination of diverse techniques such as random projection, batched low-rank update, inverse maintenance, polynomial interpolation, and fast rectangular matrix multiplication. Interestingly, our method requires a combination of different fast rectangular matrix multiplication algorithms.

preprint2020arXiv

Online Discrepancy Minimization for Stochastic Arrivals

In the stochastic online vector balancing problem, vectors $v_1,v_2,\ldots,v_T$ chosen independently from an arbitrary distribution in $\mathbb{R}^n$ arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the norm of the discrepancy vector, i.e., the signed prefix-sum, as small as possible for a given target norm. We consider some of the most well-known problems in discrepancy theory in the above online stochastic setting, and give algorithms that match the known offline bounds up to $\mathsf{polylog}(nT)$ factors. This substantially generalizes and improves upon the previous results of Bansal, Jiang, Singla, and Sinha (STOC&#39; 20). In particular, for the Komlós problem where $\|v_t\|_2\leq 1$ for each $t$, our algorithm achieves $\tilde{O}(1)$ discrepancy with high probability, improving upon the previous $\tilde{O}(n^{3/2})$ bound. For Tusnády&#39;s problem of minimizing the discrepancy of axis-aligned boxes, we obtain an $O(\log^{d+4} T)$ bound for arbitrary distribution over points. Previous techniques only worked for product distributions and gave a weaker $O(\log^{2d+1} T)$ bound. We also consider the Banaszczyk setting, where given a symmetric convex body $K$ with Gaussian measure at least $1/2$, our algorithm achieves $\tilde{O}(1)$ discrepancy with respect to the norm given by $K$ for input distributions with sub-exponential tails. Our key idea is to introduce a potential that also enforces constraints on how the discrepancy vector evolves, allowing us to maintain certain anti-concentration properties. For the Banaszczyk setting, we further enhance this potential by combining it with ideas from generic chaining. Finally, we also extend these results to the setting of online multi-color discrepancy.

preprint2020arXiv

Online Vector Balancing and Geometric Discrepancy

We consider an online vector balancing question where $T$ vectors, chosen from an arbitrary distribution over $[-1,1]^n$, arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the discrepancy small as possible. A concrete example is the online interval discrepancy problem where T points are sampled uniformly in [0,1], and the goal is to immediately color them $\pm$ such that every sub-interval remains nearly balanced. As random coloring incurs $Ω(T^{1/2})$ discrepancy, while the offline bounds are $Θ(\sqrt{n \log (T/n)})$ for vector balancing and $1$ for interval balancing, a natural question is whether one can (nearly) match the offline bounds in the online setting for these problems. One must utilize the stochasticity as in the worst-case scenario it is known that discrepancy is $Ω(T^{1/2})$ for any online algorithm. Bansal and Spencer recently show an $O(\sqrt{n}\log T)$ bound when each coordinate is independent. When there are dependencies among the coordinates, the problem becomes much more challenging, as evidenced by a recent work of Jiang, Kulkarni, and Singla that gives a non-trivial $O(T^{1/\log\log T})$ bound for online interval discrepancy. Although this beats random coloring, it is still far from the offline bound. In this work, we introduce a new framework for online vector balancing when the input distribution has dependencies across coordinates. This lets us obtain a $poly(n, \log T)$ bound for online vector balancing under arbitrary input distributions, and a $poly(\log T)$ bound for online interval discrepancy. Our framework is powerful enough to capture other well-studied geometric discrepancy problems; e.g., a $poly(\log^d (T))$ bound for the online $d$-dimensional Tusnády&#39;s problem. A key new technical ingredient is an {anti-concentration} inequality for sums of pairwise uncorrelated random variables.