Source author record

Chenglin Fan

Chenglin Fan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Geometry Machine Learning

Catalog footprint

What is connected

8works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum

We study differentially private (DP) training with Muon, a matrix-valued optimizer that updates hidden-layer weights using momentum followed by Newton--Schulz orthogonalization. While DP-SGD is well understood, the interaction between per-example clipping, Gaussian noise, momentum, and nonlinear orthogonalization in Muon has not been systematically analyzed. We formulate DP-Muon, a private Muon procedure that clips per-example matrix gradients, adds Gaussian noise to the clipped lot average, and then applies momentum and Newton--Schulz orthogonalization as post-processing. We prove that DP-Muon inherits the privacy guarantee certified by the corresponding same-lot subsampled Gaussian accountant, with no additional privacy cost from Muon-specific post-processing. On the optimization side, we establish finite-horizon and vanishing stationarity guarantees under per-matrix clipping, with bounds that separate optimization error, clipping residual, privacy noise, and Newton--Schulz approximation error. We further show that the DP-induced bias in Muon arises not in the linear momentum buffer itself, but after the nonlinear Newton--Schulz map, where Gaussian noise induces a matrix-valued heat-smoothing bias. This motivates DP-MuonBC, a bias-corrected variant that removes the leading output-level bias term while preserving the same privacy guarantee. Experiments on E2E and DART show that Muon-style matrix updates improve private fine-tuning, and that DP-MuonBC further improves utility without increasing the privacy budget.

preprint2022arXiv

$k$-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy

When designing clustering algorithms, the choice of initial centers is crucial for the quality of the learned clusters. In this paper, we develop a new initialization scheme, called HST initialization, for the $k$-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. From the tree, we propose a novel and efficient search algorithm, for good initial centers that can be used subsequently for the local search algorithm. Our proposed HST initialization can produce initial centers achieving lower errors than those from another popular initialization method, $k$-median++, with comparable efficiency. The HST initialization can also be extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error from applying DP local search followed by our private HST initialization improves previous results on the approximation error, and approaches the lower bound within a small factor. Experiments justify the theory and demonstrate the effectiveness of our proposed method. Our approach can also be extended to the $k$-means problem.

preprint2022arXiv

Breaking the Linear Error Barrier in Differentially Private Graph Distance Release

Releasing all pairwise shortest path (APSP) distances between vertices on general graphs under weight Differential Privacy (DP) is known as a challenging task. In the previous attempt of (Sealfon 2016}, by adding Laplace noise to each edge weight or to each output distance, to achieve DP with some fixed budget, with high probability the maximal absolute error among all published pairwise distances is roughly $O(n)$ where $n$ is the number of nodes. It was shown that this error could be reduced for some special graphs, which, however, is hard for general graphs. Therefore, whether the approximation error can be reduced to sublinear in $n$ is posted as an interesting open problem. We break the linear barrier on the distance approximation error of previous result, by proposing an algorithm that releases a constructed synthetic graph privately. Computing all pairwise distances on the constructed graph only introduces $\tilde O(n^{1/2})$ error in answering all pairwise shortest path distances for fixed privacy parameter. Our method is based on a novel graph diameter (link length) augmentation via constructing "shortcuts" for the paths. By adding a set of shortcut edges to the original graph, we show that any node pair has a shortest path with link length $\tilde O(n^{1/2})$. Then by adding noises with some positive mean to the edge weights, we show that the new graph is differentially private and can be published to answer all pairwise shortest path distances with $\tilde O(n^{1/2})$ approximation error using standard APSP computation. Additionally, we consider the graph with small feedback vertex set number. A feedback vertex set (FVS) of a graph is a set of vertices whose removal leaves a graph without cycles, and the feedback vertex set number of a graph, $k$, is the size of a smallest feedback vertex set. We propose a DP algorithm with error rate $\tilde O(k)$.

preprint2022arXiv

Distances Release with Differential Privacy in Tree and Grid Graph

Data about individuals may contain private and sensitive information. The differential privacy (DP) was proposed to address the problem of protecting the privacy of each individual while keeping useful information about a population. Sealfon (2016) introduced a private graph model in which the graph topology is assumed to be public while the weight information is assumed to be private. That model can express hidden congestion patterns in a known transportation system. In this paper, we revisit the problem of privately releasing approximate distances between all pairs of vertices in (Sealfon 2016). Our goal is to minimize the additive error, namely the difference between the released distance and actual distance under private setting. We propose improved solutions to that problem for several cases. For the problem of privately releasing all-pairs distances, we show that for tree with depth $h$, we can release all-pairs distances with additive error $O(\log^{1.5} h \cdot \log^{1.5} V)$ for fixed privacy parameter where $V$ the number of vertices in the tree, which improves the previous error bound $O(\log^{2.5} V)$, since the size of $h$ can be as small as $O(\log V)$. Our result implies that a $\log V$ factor is saved, and the additive error in tree can be smaller than the error on array/path. Additionally, for the grid graph with arbitrary edge weights, we also propose a method to release all-pairs distances with additive error $\tilde O(V^{3/4}) $ for fixed privacy parameters. On the application side, many cities like Manhattan are composed of horizontal streets and vertical avenues, which can be modeled as a grid graph.

preprint2022arXiv

Fitting Metrics and Ultrametrics with Minimum Disagreements

Given $x \in (\mathbb{R}_{\geq 0})^{\binom{[n]}{2}}$ recording pairwise distances, the METRIC VIOLATION DISTANCE (MVD) problem asks to compute the $\ell_0$ distance between $x$ and the metric cone; i.e., modify the minimum number of entries of $x$ to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied recently. We present an $O(\log n)$-approximation algorithm for MVD, exponentially improving the previous best approximation ratio of $O(OPT^{1/3})$ of Fan et al. [ SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity and running time. We also study the related problem of ULTRAMETRIC VIOLATION DISTANCE (UMVD), where the goal is to compute the $\ell_0$ distance to the cone of ultrametrics, and achieve a constant factor approximation algorithm. The UMVD can be regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar [SIAM J. Computing, 2011] and by Cohen-Addad et al. [FOCS, 2021] from $\ell_1$ norm to $\ell_0$ norm. We show that this problem can be favorably interpreted as an instance of Correlation Clustering with an additional hierarchical structure, which we solve using a new $O(1)$-approximation algorithm for correlation clustering that has the structural property that it outputs a refinement of the optimum clusters. An algorithm satisfying such a property can be considered of independent interest. We also provide an $O(\log n \log \log n)$ approximation algorithm for weighted instances. Finally, we investigate the complementary version of these problems where one aims at choosing a maximum number of entries of $x$ forming an (ultra-)metric. In stark contrast with the minimization versions, we prove that these maximization versions are hard to approximate within any constant factor assuming the Unique Games Conjecture.

preprint2020arXiv

Linear Expected Complexity for Directional and Multiplicative Voronoi Diagrams

While the standard unweighted Voronoi diagram in the plane has linear worst-case complexity, many of its natural generalizations do not. This paper considers two such previously studied generalizations, namely multiplicative and semi Voronoi diagrams. These diagrams both have quadratic worst-case complexity, though here we show that their expected complexity is linear for certain natural randomized inputs. Specifically, we argue that the expected complexity is linear for: (1) semi Voronoi diagrams when the visible direction is randomly sampled, and (2) for multiplicative diagrams when either weights are sampled from a constant-sized set, or the more challenging case when weights are arbitrary but locations are sampled from a square.

preprint2015arXiv

Complexity and Algorithms for the Discrete Fréchet Distance Upper Bound with Imprecise Input

We study the problem of computing the upper bound of the discrete Fréchet distance for imprecise input, and prove that the problem is NP-hard. This solves an open problem posed in 2010 by Ahn \emph{et al}. If shortcuts are allowed, we show that the upper bound of the discrete Fréchet distance with shortcuts for imprecise input can be computed in polynomial time and we present several efficient algorithms.

preprint2014arXiv

On the Chain Pair Simplification Problem

The problem of efficiently computing and visualizing the structural resemblance between a pair of protein backbones in 3D has led Bereg et al. to pose the Chain Pair Simplification problem (CPS). In this problem, given two polygonal chains $A$ and $B$ of lengths $m$ and $n$, respectively, one needs to simplify them simultaneously, such that each of the resulting simplified chains, $A'$ and $B'$, is of length at most $k$ and the discrete \frechet\ distance between $A'$ and $B'$ is at most $δ$, where $k$ and $δ$ are given parameters. In this paper we study the complexity of CPS under the discrete \frechet\ distance (CPS-3F), i.e., where the quality of the simplifications is also measured by the discrete \frechet\ distance. Since CPS-3F was posed in 2008, its complexity has remained open. However, it was believed to be \npc, since CPS under the Hausdorff distance (CPS-2H) was shown to be \npc. We first prove that the weighted version of CPS-3F is indeed weakly \npc\, even on the line, based on a reduction from the set partition problem. Then, we prove that CPS-3F is actually polynomially solvable, by presenting an $O(m^2n^2\min\{m,n\})$ time algorithm for the corresponding minimization problem. In fact, we prove a stronger statement, implying, for example, that if weights are assigned to the vertices of only one of the chains, then the problem remains polynomially solvable. We also study a few less rigid variants of CPS and present efficient solutions for them. Finally, we present some experimental results that suggest that (the minimization version of) CPS-3F is significantly better than previous algorithms for the motivating biological application.

Chenglin Fan

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum

$k$-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy

Breaking the Linear Error Barrier in Differentially Private Graph Distance Release

Distances Release with Differential Privacy in Tree and Grid Graph

Fitting Metrics and Ultrametrics with Minimum Disagreements

Linear Expected Complexity for Directional and Multiplicative Voronoi Diagrams

Complexity and Algorithms for the Discrete Fréchet Distance Upper Bound with Imprecise Input

On the Chain Pair Simplification Problem