Source author record

Lek-Heng Lim

Lek-Heng Lim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Computational Complexity Information Theory math.IT math.OC Numerical Analysis math.AG Machine Learning math.RA math.ST Statistics Theory math.AT math.FA math.PR Methodology quant-ph Systems and Control

Catalog footprint

What is connected

20works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Distances between probability distributions of different dimensions

Comparing probability distributions is an indispensable and ubiquitous task in machine learning and statistics. The most common way to compare a pair of Borel probability measures is to compute a metric between them, and by far the most widely used notions of metric are the Wasserstein metric and the total variation metric. The next most common way is to compute a divergence between them, and in this case almost every known divergences such as those of Kullback--Leibler, Jensen--Shannon, Rényi, and many more, are special cases of the $f$-divergence. Nevertheless these metrics and divergences may only be computed, in fact, are only defined, when the pair of probability measures are on spaces of the same dimension. How would one quantify, say, a KL-divergence between the uniform distribution on the interval $[-1,1]$ and a Gaussian distribution on $\mathbb{R}^3$? We show that these common notions of metrics and divergences give rise to natural distances between Borel probability measures defined on spaces of different dimensions, e.g., one on $\mathbb{R}^m$ and another on $\mathbb{R}^n$ where $m, n$ are distinct, so as to give a meaningful answer to the previous question.

preprint2020arXiv

Recht-Ré Noncommutative Arithmetic-Geometric Mean Conjecture is False

Stochastic optimization algorithms have become indispensable in modern machine learning. An unresolved foundational question in this area is the difference between with-replacement sampling and without-replacement sampling -- does the latter have superior convergence rate compared to the former? A groundbreaking result of Recht and Ré reduces the problem to a noncommutative analogue of the arithmetic-geometric mean inequality where $n$ positive numbers are replaced by $n$ positive definite matrices. If this inequality holds for all $n$, then without-replacement sampling indeed outperforms with-replacement sampling. The conjectured Recht-Ré inequality has so far only been established for $n = 2$ and a special case of $n = 3$. We will show that the Recht-Ré conjecture is false for general $n$. Our approach relies on the noncommutative Positivstellensatz, which allows us to reduce the conjectured inequality to a semidefinite program and the validity of the conjecture to certain bounds for the optimum values, which we show are false as soon as $n = 5$.

preprint2020arXiv

Symmetric Grothendieck inequality

We establish an analogue of the Grothendieck inequality where the rectangular matrix is replaced by a symmetric/Hermitian matrix and the bilinear form by a quadratic form. We call this the symmetric Grothendieck inequality; despite its name, it is a generalization -- the original Grothendieck inequality is a special case. While there are other proposals for such an inequality, ours differs in two important ways: (i) we have no additional requirement like positive semidefiniteness; (ii) our symmetric Grothendieck constant is universal, i.e., independent of the matrix and its dimensions. A consequence of our symmetric Grothendieck inequality is a "conic Grothendieck inequality" for any family of cones of symmetric matrices: The original Grothendieck inequality is a special case; as is the Nesterov $π/2$-Theorem, which corresponds to the cones of positive semidefinite matrices; as well as the Goemans-Williamson inequality, which corresponds to the cones of Laplacians. For yet other cones, e.g., of diagonally dominant matrices, we get new Grothendieck-like inequalities. A slight extension leads to a unified framework that treats any Grothendieck-like inequality as an inequality between two norms within a family of "Grothendieck norms" restricted to a family of cones. This allows us to place on equal footing the Goemans-Williamson inequality, Nesterov $π/2$-Theorem, Ben-Tal-Nemirovski-Roos $4/π$-Theorem, generalized Grothendieck inequality, order-$p$ Grothendieck inequality, rank-constrained positive semidefinite Grothendieck inequality; and in turn allows us to simplify proofs, extend results from real to complex, obtain new bounds or establish sharpness of existing ones. The symmetric Grothendieck inequality may also be applied to obtain polynomial-time approximation bounds for NP-hard combinatorial, integer, and nonconvex optimization problems.

preprint2020arXiv

Topology of deep neural networks

We study how the topology of a data set $M = M_a \cup M_b \subseteq \mathbb{R}^d$, representing two classes $a$ and $b$ in a binary classification problem, changes as it passes through the layers of a well-trained neural network, i.e., with perfect accuracy on training set and near-zero generalization error ($\approx 0.01\%$). The goal is to shed light on two mysteries in deep neural networks: (i) a nonsmooth activation function like ReLU outperforms a smooth one like hyperbolic tangent; (ii) successful neural network architectures rely on having many layers, even though a shallow network can approximate any function arbitrary well. We performed extensive experiments on the persistent homology of a wide range of point cloud data sets, both real and simulated. The results consistently demonstrate the following: (1) Neural networks operate by changing topology, transforming a topologically complicated data set into a topologically simple one as it passes through the layers. No matter how complicated the topology of $M$ we begin with, when passed through a well-trained neural network $f : \mathbb{R}^d \to \mathbb{R}^p$, there is a vast reduction in the Betti numbers of both components $M_a$ and $M_b$; in fact they nearly always reduce to their lowest possible values: $β_k\bigl(f(M_i)\bigr) = 0$ for $k \ge 1$ and $β_0\bigl(f(M_i)\bigr) = 1$, $i =a, b$. Furthermore, (2) the reduction in Betti numbers is significantly faster for ReLU activation than hyperbolic tangent activation as the former defines nonhomeomorphic maps that change topology, whereas the latter defines homeomorphic maps that preserve topology. Lastly, (3) shallow and deep networks transform data sets differently -- a shallow network operates mainly through changing geometry and changes topology only in its final layers, a deep one spreads topological changes more evenly across all layers.

preprint2016arXiv

Algorithms for structured matrix-vector product of optimal bilinear complexity

We present explicit algorithms for computing structured matrix-vector products that are optimal in the sense of Strassen, i.e., using a provably minimum number of multiplications. These structures include Toeplitz/Hankel/circulant, symmetric, Toeplitz-plus-Hankel, sparse, and multilevel structures. The last category include \textsc{bttb}, \textsc{bhhb}, \textsc{bccb} but also any arbitrarily complicated nested structures built out of other structures.

preprint2016arXiv

Fast structured matrix computations: tensor rank and Cohn--Umans method

We discuss a generalization of the Cohn-Umans method, a potent technique developed for studying the bilinear complexity of matrix multiplication by embedding matrices into an appropriate group algebra. We investigate how the Cohn-Umans method may be used for bilinear operations other than matrix multiplication, with algebras other than group algebras, and we relate it to Strassen's tensor rank approach, the traditional framework for investigating bilinear complexity. To demonstrate the utility of the generalized method, we apply it to find the fastest algorithms for forming structured matrix-vector product, the basic operation underlying iterative algorithms for structured matrices. The structures we study include Toeplitz, Hankel, circulant, symmetric, skew-symmetric, f-circulant, block-Toeplitz-Toeplitz-block, triangular Toeplitz matrices, Toeplitz-plus-Hankel, sparse/banded/triangular. Except for the case of skew-symmetric matrices, for which we have only upper bounds, the algorithms derived using the generalized Cohn-Umans method in all other instances are the fastest possible in the sense of having minimum bilinear complexity. We also apply this framework to a few other bilinear operations including matrix-matrix, commutator, simultaneous matrix products, and briefly discuss the relation between tensor nuclear norm and numerical stability.

preprint2016arXiv

Nuclear Norm of Higher-Order Tensors

We establish several mathematical and computational properties of the nuclear norm for higher-order tensors. We show that like tensor rank, tensor nuclear norm is dependent on the choice of base field --- the value of the nuclear norm of a real 3-tensor depends on whether we regard it as a real 3-tensor or a complex 3-tensor with real entries. We show that every tensor has a nuclear norm attaining decomposition and every symmetric tensor has a symmetric nuclear norm attaining decomposition. There is a corresponding notion of nuclear rank that, unlike tensor rank, is upper semicontinuous. We establish an analogue of Banach's theorem for tensor spectral norm and Comon's conjecture for tensor rank --- for a symmetric tensor, its symmetric nuclear norm always equals its nuclear norm. We show that computing tensor nuclear norm is NP-hard in several sense. Deciding weak membership in the nuclear norm unit ball of 3-tensors is NP-hard, as is finding an $\varepsilon$-approximation of nuclear norm for 3-tensors. In addition, the problem of computing spectral or nuclear norm of a 4-tensor is NP-hard, even if we restrict the 4-tensor to be bi-Hermitian, bisymmetric, positive semidefinite, nonnegative valued, or all of the above. We discuss some simple polynomial-time approximation bounds. As an aside, we show that the nuclear $(p,q)$-norm of a matrix is NP-hard in general but can be computed in polynomial-time if $p=1$, $q = 1$, or $p=q=2$, with closed-form expressions for the nuclear $(1,q)$- and $(p,1)$-norms.

preprint2016arXiv

Schubert varieties and distances between subspaces of different dimensions

We resolve a basic problem on subspace distances that often arises in applications: How can the usual Grassmann distance between equidimensional subspaces be extended to subspaces of different dimensions? We show that a natural solution is given by the distance of a point to a Schubert variety within the Grassmannian. This distance reduces to the Grassmann distance when the subspaces are equidimensional and does not depend on any embedding into a larger ambient space. Furthermore, it has a concrete expression involving principal angles, and is efficiently computable in numerically stable ways. Our results are largely independent of the Grassmann distance --- if desired, it may be substituted by any other common distances between subspaces. Our approach depends on a concrete algebraic geometric view of the Grassmannian that parallels the differential geometric perspective that is well-established in applied and computational mathematics.

preprint2016arXiv

Semialgebraic Geometry of Nonnegative Tensor Rank

We study the semialgebraic structure of $D_r$, the set of nonnegative tensors of nonnegative rank not more than $r$, and use the results to infer various properties of nonnegative tensor rank. We determine all nonnegative typical ranks for cubical nonnegative tensors and show that the direct sum conjecture is true for nonnegative tensor rank. We show that nonnegative, real, and complex ranks are all equal for a general nonnegative tensor of nonnegative rank strictly less than the complex generic rank. In addition, such nonnegative tensors always have unique nonnegative rank-$r$ decompositions if the real tensor space is $r$-identifiable. We determine conditions under which a best nonnegative rank-$r$ approximation has a unique nonnegative rank-$r$ decomposition: for $r \le 3$, this is always the case; for general $r$, this is the case when the best nonnegative rank-$r$ approximation does not lie on the boundary of $D_r$. Many of our general identifiability results also apply to real tensors and real symmetric tensors.

preprint2016arXiv

The Computational Complexity of Duality

We show that for any given norm ball or proper cone, weak membership in its dual ball or dual cone is polynomial-time reducible to weak membership in the given ball or cone. A consequence is that the weak membership or membership problem for a ball or cone is NP-hard if and only if the corresponding problem for the dual ball or cone is NP-hard. In a similar vein, we show that computation of the dual norm of a given norm is polynomial-time reducible to computation of the given norm. This extends to convex functions satisfying a polynomial growth condition: for such a given function, computation of its Fenchel dual/conjugate is polynomial-time reducible to computation of the given function. Hence the computation of a norm or a convex function of polynomial-growth is NP-hard if and only if the computation of its dual norm or Fenchel dual is NP-hard. We discuss implications of these results on the weak membership problem for a symmetric convex body and its polar dual, the polynomial approximability of Mahler volume, and the weak membership problem for the epigraph of a convex function with polynomial growth and that of its Fenchel dual.

preprint2016arXiv

Uniqueness of Nonnegative Tensor Approximations

We show that for a nonnegative tensor, a best nonnegative rank-r approximation is almost always unique, its best rank-one approximation may always be chosen to be a best nonnegative rank-one approximation, and that the set of nonnegative tensors with non-unique best rank-one approximations form an algebraic hypersurface. We show that the last part holds true more generally for real tensors and thereby determine a polynomial equation so that a real or nonnegative tensor which does not satisfy this equation is guaranteed to have a unique best rank-one approximation. We also establish an analogue for real or nonnegative symmetric tensors. In addition, we prove a singular vector variant of the Perron--Frobenius Theorem for positive tensors and apply it to show that a best nonnegative rank-r approximation of a positive tensor can never be obtained by deflation. As an aside, we verify that the Euclidean distance (ED) discriminants of the Segre variety and the Veronese variety are hypersurfaces and give defining equations of these ED discriminants.

preprint2015arXiv

Learning Subspaces of Different Dimension

We introduce a Bayesian model for inferring mixtures of subspaces of different dimensions. The key challenge in such a mixture model is specification of prior distributions over subspaces of different dimensions. We address this challenge by embedding subspaces or Grassmann manifolds into a sphere of relatively low dimension and specifying priors on the sphere. We provide an efficient sampling algorithm for the posterior distribution of the model parameters. We illustrate that a simple extension of our mixture of subspaces model can be applied to topic modeling. We also prove posterior consistency for the mixture of subspaces model. The utility of our approach is demonstrated with applications to real and simulated data.

preprint2014arXiv

Every matrix is a product of Toeplitz matrices

We show that every n-by-n matrix is generically a product of [n/2] + 1 Toeplitz matrices and always a product of at most 2n+5 Toeplitz matrices. The same result holds true if the word "Toeplitz" is replaced by "Hankel", and the generic bound [n/2] + 1 is sharp. We will see that these decompositions into Toeplitz or Hankel factors are unusual: We may not in general replace the subspace of Toeplitz or Hankel matrices by an arbitrary (2n-1)-dimensional subspace of n-by-n matrices. Furthermore such decompositions do not exist if we require the factors to be symmetric Toeplitz, persymmetric Hankel, or circulant matrices, even if we allow an infinite number of factors. Lastly, we discuss how the Toeplitz and Hankel decompositions of a generic matrix may be computed by either (i) solving a system of linear and quadratic equations if the number of factors is required to be [n/2] + 1, or (ii) Gaussian elimination in O(n^3) time if the number of factors is allowed to be 2n.

preprint2014arXiv

Multilinear PageRank

In this paper, we first extend the celebrated PageRank modification to a higher-order Markov chain. Although this system has attractive theoretical properties, it is computationally intractable for many interesting problems. We next study a computationally tractable approximation to the higher-order PageRank vector that involves a system of polynomial equations called multilinear PageRank, which is a type of tensor PageRank vector. It is motivated by a novel "spacey random surfer" model, where the surfer remembers bits and pieces of history and is influenced by this information. The underlying stochastic process is an instance of a vertex-reinforced random walk. We develop convergence theory for a simple fixed-point method, a shifted fixed-point method, and a Newton iteration in a particular parameter regime. In marked contrast to the case of the PageRank vector of a Markov chain where the solution is always unique and easy to compute, there are parameter regimes of multilinear PageRank where solutions are not unique and simple algorithms do not converge. We provide a repository of these non-convergent cases that we encountered through exhaustive enumeration and randomly sampling that we believe is useful for future study of the problem.

preprint2013arXiv

Blind Multilinear Identification

We discuss a technique that allows blind recovery of signals or blind identification of mixtures in instances where such recovery or identification were previously thought to be impossible: (i) closely located or highly correlated sources in antenna array processing, (ii) highly correlated spreading codes in CDMA radio communication, (iii) nearly dependent spectra in fluorescent spectroscopy. This has important implications --- in the case of antenna array processing, it allows for joint localization and extraction of multiple sources from the measurement of a noisy mixture recorded on multiple sensors in an entirely deterministic manner. In the case of CDMA, it allows the possibility of having a number of users larger than the spreading gain. In the case of fluorescent spectroscopy, it allows for detection of nearly identical chemical constituents. The proposed technique involves the solution of a bounded coherence low-rank multilinear approximation problem. We show that bounded coherence allows us to establish existence and uniqueness of the recovered solution. We will provide some statistical motivation for the approximation problem and discuss greedy approximation bounds. To provide the theoretical underpinnings for this technique, we develop a corresponding theory of sparse separable decompositions of functions, including notions of rank and nuclear norm that specialize to the usual ones for matrices and operators but apply to also hypermatrices and tensors.

preprint2013arXiv

Most tensor problems are NP-hard

We prove that multilinear (tensor) analogues of many efficiently computable problems in numerical linear algebra are NP-hard. Our list here includes: determining the feasibility of a system of bilinear equations, deciding whether a 3-tensor possesses a given eigenvalue, singular value, or spectral norm; approximating an eigenvalue, eigenvector, singular vector, or the spectral norm; and determining the rank or best rank-1 approximation of a 3-tensor. Furthermore, we show that restricting these problems to symmetric tensors does not alleviate their NP-hardness. We also explain how deciding nonnegative definiteness of a symmetric 4-tensor is NP-hard and how computing the combinatorial hyperdeterminant of a 4-tensor is NP-, #P-, and VNP-hard. We shall argue that our results provide another view of the boundary separating the computational tractability of linear/convex problems from the intractability of nonlinear/nonconvex ones.

preprint2013arXiv

Multiarray Signal Processing: Tensor decomposition meets compressed sensing

We discuss how recently discovered techniques and tools from compressed sensing can be used in tensor decompositions, with a view towards modeling signals from multiple arrays of multiple sensors. We show that with appropriate bounds on a measure of separation between radiating sources called coherence, one could always guarantee the existence and uniqueness of a best rank-r approximation of the tensor representing the signal. We also deduce a computationally feasible variant of Kruskal's uniqueness condition, where the coherence appears as a proxy for k-rank. Problems of sparsest recovery with an infinite continuous dictionary, lowest-rank tensor representation, and blind source separation are treated in a uniform fashion. The decomposition of the measurement tensor leads to simultaneous localization and extraction of radiating sources, in an entirely deterministic manner.

preprint2013arXiv

Self-concordance is NP-hard

We give an elementary proof of a somewhat curious result, namely, that deciding whether a convex function is self-concordant is in general an intractable problem.

preprint2012arXiv

PARNES: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals

In this article, we propose an algorithm, NESTA-LASSO, for the LASSO problem, i.e., an underdetermined linear least-squares problem with a 1-norm constraint on the solution. We prove under the assumption of the restricted isometry property (RIP) and a sparsity condition on the solution, that NESTA-LASSO is guaranteed to be almost always locally linearly convergent. As in the case of the algorithm NESTA proposed by Becker, Bobin, and Candes, we rely on Nesterov's accelerated proximal gradient method, which takes O(e^{-1/2}) iterations to come within e > 0 of the optimal value. We introduce a modification to Nesterov's method that regularly updates the prox-center in a provably optimal manner, and the aforementioned linear convergence is in part due to this modification. In the second part of this article, we attempt to solve the basis pursuit denoising BPDN problem (i.e., approximating the minimum 1-norm solution to an underdetermined least squares problem) by using NESTA-LASSO in conjunction with the Pareto root-finding method employed by van den Berg and Friedlander in their SPGL1 solver. The resulting algorithm is called PARNES. We provide numerical evidence to show that it is comparable to currently available solvers.

preprint2011arXiv

Rank Aggregation via Nuclear Norm Minimization

The process of rank aggregation is intimately intertwined with the structure of skew-symmetric matrices. We apply recent advances in the theory and algorithms of matrix completion to skew-symmetric matrices. This combination of ideas produces a new method for ranking a set of items. The essence of our idea is that a rank aggregation describes a partially filled skew-symmetric matrix. We extend an algorithm for matrix completion to handle skew-symmetric data and use that to extract ranks for each item. Our algorithm applies to both pairwise comparison and rating data. Because it is based on matrix completion, it is robust to both noise and incomplete data. We show a formal recovery result for the noiseless case and present a detailed study of the algorithm on synthetic data and Netflix ratings.

Lek-Heng Lim

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Distances between probability distributions of different dimensions

Recht-Ré Noncommutative Arithmetic-Geometric Mean Conjecture is False

Symmetric Grothendieck inequality

Topology of deep neural networks

Algorithms for structured matrix-vector product of optimal bilinear complexity

Fast structured matrix computations: tensor rank and Cohn--Umans method

Nuclear Norm of Higher-Order Tensors

Schubert varieties and distances between subspaces of different dimensions

Semialgebraic Geometry of Nonnegative Tensor Rank

The Computational Complexity of Duality

Uniqueness of Nonnegative Tensor Approximations

Learning Subspaces of Different Dimension

Every matrix is a product of Toeplitz matrices

Multilinear PageRank

Blind Multilinear Identification

Most tensor problems are NP-hard

Multiarray Signal Processing: Tensor decomposition meets compressed sensing

Self-concordance is NP-hard

PARNES: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals

Rank Aggregation via Nuclear Norm Minimization