Source author record

Ali Vakilian

Ali Vakilian appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Artificial Intelligence Discrete Mathematics cs.CY

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Learning with Conflicts of Interest

Financial, social, and political factors often prevent the interests of the owners of ML systems and services and their users from being perfectly aligned. ML systems often produce biased information that can influence users to make decisions that are not in their best interest. Current solution approaches require ML systems to implement protocols to mitigate their biases. However, ML system owners usually do not have any incentive to implement these protocols and often argue that it limits their freedom of expression or business. We believe that a successful solution to this problem must recognize the conflict of interest between the ML systems and their users, and use this information to protect users against information that adversely influences their decisions while allowing users to safely benefit from these systems. To this end, we propose a game-theoretic framework that models the interaction between ML systems and users with conflicts of interest. We present scalable algorithms with theoretical guarantees that maximize the amount of desired information and actions and minimize the amount of biased and manipulative actions in interaction with ML systems.

preprint2022arXiv

Fair Representation Clustering with Several Protected Classes

We study the problem of fair $k$-median where each cluster is required to have a fair representation of individuals from different groups. In the fair representation $k$-median problem, we are given a set of points $X$ in a metric space. Each point $x\in X$ belongs to one of $\ell$ groups. Further, we are given fair representation parameters $α_j$ and $β_j$ for each group $j\in [\ell]$. We say that a $k$-clustering $C_1, \cdots, C_k$ fairly represents all groups if the number of points from group $j$ in cluster $C_i$ is between $α_j |C_i|$ and $β_j |C_i|$ for every $j\in[\ell]$ and $i\in [k]$. The goal is to find a set $\mathcal{C}$ of $k$ centers and an assignment $ϕ: X\rightarrow \mathcal{C}$ such that the clustering defined by $(\mathcal{C}, ϕ)$ fairly represents all groups and minimizes the $\ell_1$-objective $\sum_{x\in X} d(x, ϕ(x))$. We present an $O(\log k)$-approximation algorithm that runs in time $n^{O(\ell)}$. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both $k$ and $\ell$. We also consider an important special case of the problem where $α_j = β_j = \frac{f_j}{f}$ and $f_j, f \in \mathbb{N}$ for all $j\in [\ell]$. For this special case, we present an $O(\log k)$-approximation algorithm that runs in $(kf)^{O(\ell)}\log n + poly(n)$ time.

preprint2022arXiv

Faster Fundamental Graph Algorithms via Learned Predictions

We consider the question of speeding up classic graph algorithms with machine-learned predictions. In this model, algorithms are furnished with extra advice learned from past or similar instances. Given the additional information, we aim to improve upon the traditional worst-case run-time guarantees. Our contributions are the following: (i) We give a faster algorithm for minimum-weight bipartite matching via learned duals, improving the recent result by Dinitz, Im, Lavastida, Moseley and Vassilvitskii (NeurIPS, 2021); (ii) We extend the learned dual approach to the single-source shortest path problem (with negative edge lengths), achieving an almost linear runtime given sufficiently accurate predictions which improves upon the classic fastest algorithm due to Goldberg (SIAM J. Comput., 1995); (iii) We provide a general reduction-based framework for learning-based graph algorithms, leading to new algorithms for degree-constrained subgraph and minimum-cost $0$-$1$ flow, based on reductions to bipartite matching and the shortest path problem. Finally, we give a set of general learnability theorems, showing that the predictions required by our algorithms can be efficiently learned in a PAC fashion.

preprint2022arXiv

Improved Approximation Algorithms for Individually Fair Clustering

We consider the $k$-clustering problem with $\ell_p$-norm cost, which includes $k$-median, $k$-means and $k$-center, under an individual notion of fairness proposed by Jung et al. [2020]: given a set of points $P$ of size $n$, a set of $k$ centers induces a fair clustering if every point in $P$ has a center among its $n/k$ closest neighbors. Mahabadi and Vakilian [2020] presented a $(p^{O(p)},7)$-bicriteria approximation for fair clustering with $\ell_p$-norm cost: every point finds a center within distance at most $7$ times its distance to its $(n/k)$-th closest neighbor and the $\ell_p$-norm cost of the solution is at most $p^{O(p)}$ times the cost of an optimal fair solution. In this work, for any $\varepsilon>0$, we present an improved $(16^p +\varepsilon,3)$-bicriteria for this problem. Moreover, for $p=1$ ($k$-median) and $p=\infty$ ($k$-center), we present improved cost-approximation factors $7.081+\varepsilon$ and $3+\varepsilon$ respectively. To achieve our guarantees, we extend the framework of [Charikar et al., 2002, Swamy, 2016] and devise a $16^p$-approximation algorithm for the facility location with $\ell_p$-norm cost under matroid constraint which might be of an independent interest. Besides, our approach suggests a reduction from our individually fair clustering to a clustering with a group fairness requirement proposed by Kleindessner et al. [2019], which is essentially the median matroid problem [Krishnaswamy et al., 2011].

preprint2022arXiv

Individual Preference Stability for Clustering

In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. Our notion can be motivated from several perspectives, including game theory and algorithmic fairness. We study several questions related to our proposed notion. We first show that deciding whether a given data set allows for an IP-stable clustering in general is NP-hard. As a result, we explore the design of efficient algorithms for finding IP-stable clusterings in some restricted metric spaces. We present a polytime algorithm to find a clustering satisfying exact IP-stability on the real line, and an efficient algorithm to find an IP-stable 2-clustering for a tree metric. We also consider relaxing the stability constraint, i.e., every data point should not be too far from its own cluster compared to any other cluster. For this case, we provide polytime algorithms with different guarantees. We evaluate some of our algorithms and several standard clustering approaches on real data sets.

preprint2022arXiv

Multi Stage Screening: Enforcing Fairness and Maximizing Efficiency in a Pre-Existing Pipeline

Consider an actor making selection decisions using a series of classifiers, which we term a sequential screening process. The early stages filter out some applicants, and in the final stage an expensive but accurate test is applied to the individuals that make it to the final stage. Since the final stage is expensive, if there are multiple groups with different fractions of positives at the penultimate stage (even if a slight gap), then the firm may naturally only choose to the apply the final (interview) stage solely to the highest precision group which would be clearly unfair to the other groups. Even if the firm is required to interview all of those who pass the final round, the tests themselves could have the property that qualified individuals from some groups pass more easily than qualified individuals from others. Thus, we consider requiring Equality of Opportunity (qualified individuals from each each group have the same chance of reaching the final stage and being interviewed). We then examine the goal of maximizing quantities of interest to the decision maker subject to this constraint, via modification of the probabilities of promotion through the screening process at each stage based on performance at the previous stage. We exhibit algorithms for satisfying Equal Opportunity over the selection process and maximizing precision (the fraction of interview that yield qualified candidates) as well as linear combinations of precision and recall (recall determines the number of applicants needed per hire) at the end of the final stage. We also present examples showing that the solution space is non-convex, which motivate our exact and (FPTAS) approximation algorithms for maximizing the linear combination of precision and recall. Finally, we discuss the `price of' adding additional restrictions, such as not allowing the decision maker to use group membership in its decision process.

preprint2020arXiv

(Learned) Frequency Estimation Algorithms under Zipfian Distribution

\begin{abstract} The frequencies of the elements in a data stream are an important statistical measure and the task of estimating them arises in many applications within data analysis and machine learning. Two of the most popular algorithms for this problem, Count-Min and Count-Sketch, are widely used in practice. In a recent work [Hsu et al., ICLR'19], it was shown empirically that augmenting Count-Min and Count-Sketch with a machine learning algorithm leads to a significant reduction of the estimation error. The experiments were complemented with an analysis of the expected error incurred by Count-Min (both the standard and the augmented version) when the input frequencies follow a Zipfian distribution. Although the authors established that the learned version of Count-Min has lower estimation error than its standard counterpart, their analysis of the standard Count-Min algorithm was not tight. Moreover, they provided no similar analysis for Count-Sketch. In this paper we resolve these problems. First, we provide a simple tight analysis of the expected error incurred by Count-Min. Second, we provide the first error bounds for both the standard and the augmented version of Count-Sketch. These bounds are nearly tight and again demonstrate an improved performance of the learned version of Count-Sketch. In addition to demonstrating tight gaps between the aforementioned algorithms, we believe that our bounds for the standard versions of Count-Min and Count-Sketch are of independent interest. In particular, it is a typical practice to set the number of hash functions in those algorithms to $Θ(\log n)$. In contrast, our results show that to minimize the \emph{expected} error, the number of hash functions should be a constant, strictly greater than $1$.

preprint2020arXiv

Individual Fairness for $k$-Clustering

We give a local search based algorithm for $k$-median and $k$-means (and more generally for any $k$-clustering with $\ell_p$ norm cost function) from the perspective of individual fairness. More precisely, for a point $x$ in a point set $P$ of size $n$, let $r(x)$ be the minimum radius such that the ball of radius $r(x)$ centered at $x$ has at least $n/k$ points from $P$. Intuitively, if a set of $k$ random points are chosen from $P$ as centers, every point $x\in P$ expects to have a center within radius $r(x)$. An individually fair clustering provides such a guarantee for every point $x\in P$. This notion of fairness was introduced in [Jung et al., 2019] where they showed how to get an approximately feasible $k$-clustering with respect to this fairness condition. In this work, we show how to get a bicriteria approximation for fair $k$-clustering: The $k$-median ($k$-means) cost of our solution is within a constant factor of the cost of an optimal fair $k$-clustering, and our solution approximately satisfies the fairness condition (also within a constant factor). Further, we complement our theoretical bounds with empirical evaluation.

preprint2016arXiv

Towards Tight Bounds for the Streaming Set Cover Problem

We consider the classic Set Cover problem in the data stream model. For $n$ elements and $m$ sets ($m\geq n$) we give a $O(1/δ)$-pass algorithm with a strongly sub-linear $\tilde{O}(mn^δ)$ space and logarithmic approximation factor. This yields a significant improvement over the earlier algorithm of Demaine et al. [DIMV14] that uses exponentially larger number of passes. We complement this result by showing that the tradeoff between the number of passes and space exhibited by our algorithm is tight, at least when the approximation factor is equal to $1$. Specifically, we show that any algorithm that computes set cover exactly using $({1 \over 2δ}-1)$ passes must use $\tildeΩ(mn^δ)$ space in the regime of $m=O(n)$. Furthermore, we consider the problem in the geometric setting where the elements are points in $\mathbb{R}^2$ and sets are either discs, axis-parallel rectangles, or fat triangles in the plane, and show that our algorithm (with a slight modification) uses the optimal $\tilde{O}(n)$ space to find a logarithmic approximation in $O(1/δ)$ passes. Finally, we show that any randomized one-pass algorithm that distinguishes between covers of size 2 and 3 must use a linear (i.e., $Ω(mn)$) amount of space. This is the first result showing that a randomized, approximate algorithm cannot achieve a space bound that is sublinear in the input size. This indicates that using multiple passes might be necessary in order to achieve sub-linear space bounds for this problem while guaranteeing small approximation factors.

preprint2013arXiv

Connected Domatic Packings in Node-capacitated Graphs

A set of vertices in a graph is a dominating set if every vertex outside the set has a neighbor in the set. A dominating set is connected if the subgraph induced by its vertices is connected. The connected domatic partition problem asks for a partition of the nodes into connected dominating sets. The connected domatic number of a graph is the size of a largest connected domatic partition and it is a well-studied graph parameter with applications in the design of wireless networks. In this note, we consider the fractional counterpart of the connected domatic partition problem in \emph{node-capacitated} graphs. Let $n$ be the number of nodes in the graph and let $k$ be the minimum capacity of a node separator in $G$. Fractionally we can pack at most $k$ connected dominating sets subject to the capacities on the nodes, and our algorithms construct packings whose sizes are proportional to $k$. Some of our main contributions are the following: \begin{itemize} \item An algorithm for constructing a fractional connected domatic packing of size $Ω(k)$ for node-capacitated planar and minor-closed families of graphs. \item An algorithm for constructing a fractional connected domatic packing of size $Ω(k / \ln{n})$ for node-capacitated general graphs. \end{itemize}

Ali Vakilian

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Learning with Conflicts of Interest

Fair Representation Clustering with Several Protected Classes

Faster Fundamental Graph Algorithms via Learned Predictions

Improved Approximation Algorithms for Individually Fair Clustering

Individual Preference Stability for Clustering

Multi Stage Screening: Enforcing Fairness and Maximizing Efficiency in a Pre-Existing Pipeline

(Learned) Frequency Estimation Algorithms under Zipfian Distribution

Individual Fairness for $k$-Clustering

Towards Tight Bounds for the Streaming Set Cover Problem

Connected Domatic Packings in Node-capacitated Graphs