Source author record

Kenneth Lange

Kenneth Lange appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation math.OC Applications Methodology math.NA Genomics Machine Learning Data Structures and Algorithms Populations and Evolution Numerical Analysis

Catalog footprint

What is connected

21works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A unified analysis of convex and non-convex lp-ball projection problems

The task of projecting onto $\ell_p$ norm balls is ubiquitous in statistics and machine learning, yet the availability of actionable algorithms for doing so is largely limited to the special cases of $p = \left\{ 0, 1,2, \infty \right\}$. In this paper, we introduce novel, scalable methods for projecting onto the $\ell_p$ ball for general $p>0$. For $p \geq1 $, we solve the univariate Lagrangian dual via a dual Newton method. We then carefully design a bisection approach for $p<1$, presenting theoretical and empirical evidence of zero or a small duality gap in the non-convex case. The success of our contributions is thoroughly assessed empirically, and applied to large-scale regularized multi-task learning and compressed sensing.

preprint2022arXiv

Extensions to the Proximal Distance Method of Constrained Optimization

The current paper studies the problem of minimizing a loss $f(\boldsymbol{x})$ subject to constraints of the form $\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not, and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method with the proximal distance principle. The latter is driven by minimization of penalized objectives $f(\boldsymbol{x})+\fracρ{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ involving large tuning constants $ρ$ and the squared Euclidean distance of $\boldsymbol{D}\boldsymbol{x}$ from $S$. The next iterate $\boldsymbol{x}_{n+1}$ of the corresponding proximal distance algorithm is constructed from the current iterate $\boldsymbol{x}_n$ by minimizing the majorizing surrogate function $f(\boldsymbol{x})+\fracρ{2}\|\boldsymbol{D}\boldsymbol{x}-\mathcal{P}_{S}(\boldsymbol{D}\boldsymbol{x}_n)\|^2$. For fixed $ρ$ and a subanalytic loss $f(\boldsymbol{x})$ and a subanalytic constraint set $S$, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare against the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems.

preprint2022arXiv

Feature Selection for Vertex Discriminant Analysis

We revisit vertex discriminant analysis (VDA) from the perspective of proximal distance algorithms. By specifying sparsity sets as constraints that directly control the number of active features, VDA is able to fit multiclass classifiers with no more than $k$ active features. We combine our sparse VDA approach with repeated cross validation to fit classifiers across the full range of model sizes on a given dataset. Our numerical examples demonstrate that grappling with sparsity directly is an attractive approach to model building in high-dimensional settings. Applications to kernel-based VDA are also considered.

preprint2021arXiv

Orthogonal Trace-Sum Maximization: Applications, Local Algorithms, and Global Optimality

This paper studies the problem of maximizing the sum of traces of matrix quadratic forms on a product of Stiefel manifolds. This orthogonal trace-sum maximization (OTSM) problem generalizes many interesting problems such as generalized canonical correlation analysis (CCA), Procrustes analysis, and cryo-electron microscopy of the Nobel prize fame. For these applications finding global solutions is highly desirable but it has been unclear how to find even a stationary point, let alone testing its global optimality. Through a close inspection of Ky Fan's classical result (1949) on the variational formulation of the sum of largest eigenvalues of a symmetric matrix, and a semidefinite programming (SDP) relaxation of the latter, we first provide a simple method to certify global optimality of a given stationary point of OTSM. This method only requires testing whether a symmetric matrix is positive semidefinite. A by-product of this analysis is an unexpected strong duality between Shapiro-Botha (1988) and Zhang-Singer (2017). After showing that a popular algorithm for generalized CCA and Procrustes analysis may generate oscillating iterates, we propose a simple fix that provably guarantees convergence to a stationary point. The combination of our algorithm and certificate reveals novel global optima of various instances of OTSM.

preprint2015arXiv

MM Algorithms for Variance Components Models

Variance components estimation and mixed model analysis are central themes in statistics with applications in numerous scientific disciplines. Despite the best efforts of generations of statisticians and numerical analysts, maximum likelihood estimation and restricted maximum likelihood estimation of variance component models remain numerically challenging. Building on the minorization-maximization (MM) principle, this paper presents a novel iterative algorithm for variance components estimation. MM algorithm is trivial to implement and competitive on large data problems. The algorithm readily extends to more complicated problems such as linear mixed models, multivariate response models possibly with missing data, maximum a posteriori estimation, penalized estimation, and generalized estimating equations (GEE). We establish the global convergence of the MM algorithm to a KKT point and demonstrate, both numerically and theoretically, that it converges faster than the classical EM algorithm when the number of variance components is greater than two and all covariance matrices are positive definite.

preprint2015arXiv

The proximal distance algorithm

The MM principle is a device for creating optimization algorithms satisfying the ascent or descent property. The current survey emphasizes the role of the MM principle in nonlinear programming. For smooth functions, one can construct an adaptive interior point method based on scaled Bregmann barriers. This algorithm does not follow the central path. For convex programming subject to nonsmooth constraints, one can combine an exact penalty method with distance majorization to create versatile algorithms that are effective even in discrete optimization. These proximal distance algorithms are highly modular and reduce to set projections and proximal mappings, both very well-understood techniques in optimization. We illustrate the possibilities in linear programming, binary piecewise-linear programming, nonnegative quadratic programming, $\ell_0$ regression, matrix completion, and inverse sparse covariance estimation.

preprint2014arXiv

Fast Genome-Wide QTL Analysis Using Mendel

Pedigree GWAS (Option 29) in the current version of the Mendel software is an optimized subroutine for performing large scale genome-wide QTL analysis. This analysis (a) works for random sample data, pedigree data, or a mix of both, (b) is highly efficient in both run time and memory requirement, (c) accommodates both univariate and multivariate traits, (d) works for autosomal and x-linked loci, (e) correctly deals with missing data in traits, covariates, and genotypes, (f) allows for covariate adjustment and constraints among parameters, (g) uses either theoretical or SNP-based empirical kinship matrix for additive polygenic effects, (h) allows extra variance components such as dominant polygenic effects and household effects, (i) detects and reports outlier individuals and pedigrees, and (j) allows for robust estimation via the $t$-distribution. The current paper assesses these capabilities on the genetics analysis workshop 19 (GAW19) sequencing data. We analyzed simulated and real phenotypes for both family and random sample data sets. For instance, when jointly testing the 8 longitudinally measured systolic blood pressure (SBP) and diastolic blood pressure (DBP) traits, it takes Mendel 78 minutes on a standard laptop computer to read, quality check, and analyze a data set with 849 individuals and 8.3 million SNPs. Genome-wide eQTL analysis of 20,643 expression traits on 641 individuals with 8.3 million SNPs takes 30 hours using 20 parallel runs on a cluster. Mendel is freely available at \url{http://www.genetics.ucla.edu/software}.

preprint2014arXiv

Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data

Since most analysis software for genome-wide association studies (GWAS) currently exploit only unrelated individuals, there is a need for efficient applications that can handle general pedigree data or mixtures of both population and pedigree data. Even data sets thought to consist of only unrelated individuals may include cryptic relationships that can lead to false positives if not discovered and controlled for. In addition, family designs possess compelling advantages. They are better equipped to detect rare variants, control for population stratification, and facilitate the study of parent-of-origin effects. Pedigrees selected for extreme trait values often segregate a single gene with strong effect. Finally, many pedigrees are available as an important legacy from the era of linkage analysis. Unfortunately, pedigree likelihoods are notoriously hard to compute. In this paper we re-examine the computational bottlenecks and implement ultra-fast pedigree-based GWAS analysis. Kinship coefficients can either be based on explicitly provided pedigrees or automatically estimated from dense markers. Our strategy (a) works for random sample data, pedigree data, or a mix of both; (b) entails no loss of power; (c) allows for any number of covariate adjustments, including correction for population stratification; (d) allows for testing SNPs under additive, dominant, and recessive models; and (e) accommodates both univariate and multivariate quantitative traits. On a typical personal computer (6 CPU cores at 2.67 GHz), analyzing a univariate HDL (high-density lipoprotein) trait from the San Antonio Family Heart Study (935,392 SNPs on 1357 individuals in 124 pedigrees) takes less than 2 minutes and 1.5 GB of memory. Complete multivariate QTL analysis of the three time-points of the longitudinal HDL multivariate trait takes less than 5 minutes and 1.5 GB of memory.

preprint2014arXiv

Splitting Methods for Convex Clustering

Clustering is a fundamental problem in many scientific applications. Standard methods such as $k$-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of $k$-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work we present two splitting methods for solving the convex clustering problem. The first is an instance of the alternating direction method of multipliers (ADMM); the second is an instance of the alternating minimization algorithm (AMA). In contrast to previously considered algorithms, our ADMM and AMA formulations provide simple and unified frameworks for solving the convex clustering problem under the previously studied norms and open the door to potentially novel norms. We demonstrate the performance of our algorithm on both simulated and real data examples. While the differences between the two algorithms appear to be minor on the surface, complexity analysis and numerical experiments show AMA to be significantly more efficient.

preprint2013arXiv

Distance Majorization and Its Applications

The problem of minimizing a continuously differentiable convex function over an intersection of closed convex sets is ubiquitous in applied mathematics. It is particularly interesting when it is easy to project onto each separate set, but nontrivial to project onto their intersection. Algorithms based on Newton's method such as the interior point method are viable for small to medium-scale problems. However, modern applications in statistics, engineering, and machine learning are posing problems with potentially tens of thousands of parameters or more. We revisit this convex programming problem and propose an algorithm that scales well with dimensionality. Our proposal is an instance of a sequential unconstrained minimization technique and revolves around three ideas: the majorization-minimization (MM) principle, the classical penalty method for constrained optimization, and quasi-Newton acceleration of fixed-point algorithms. The performance of our distance majorization algorithms is illustrated in several applications.

preprint2013arXiv

Stable Estimation of a Covariance Matrix Guided by Nuclear Norm Penalties

Estimation of covariance matrices or their inverses plays a central role in many statistical methods. For these methods to work reliably, estimated matrices must not only be invertible but also well-conditioned. In this paper we present an intuitive prior that shrinks the classic sample covariance estimator towards a stable target. We prove that our estimator is consistent and asymptotically efficient. Thus, it gracefully transitions towards the sample covariance matrix as the number of samples grows relative to the number of covariates. We also demonstrate the utility of our estimator in two standard situations -- discriminant analysis and EM clustering -- when the number of samples is dominated by or comparable to the number of covariates.

preprint2013arXiv

Techniques for Solving Sudoku Puzzles

Solving Sudoku puzzles is one of the most popular pastimes in the world. Puzzles range in difficulty from easy to very challenging; the hardest puzzles tend to have the most empty cells. The current paper explains and compares three algorithms for solving Sudoku puzzles. Backtracking, simulated annealing, and alternating projections are generic methods for attacking combinatorial optimization problems. Our results favor backtracking. It infallibly solves a Sudoku puzzle or deduces that a unique solution does not exist. However, backtracking does not scale well in high-dimensional combinatorial optimization. Hence, it is useful to expose students in the mathematical sciences to the other two solution techniques in a concrete setting. Simulated annealing shares a common structure with MCMC (Markov chain Monte Carlo) and enjoys wide applicability. The method of alternating projections solves the feasibility problem in convex programming. Converting a discrete optimization problem into a continuous optimization problem opens up the possibility of handling combinatorial problems of much higher dimensionality.

preprint2012arXiv

A Look at the Generalized Heron Problem through the Lens of Majorization-Minimization

In a recent issue of this journal, Mordukhovich et al.\ pose and solve an interesting non-differentiable generalization of the Heron problem in the framework of modern convex analysis. In the generalized Heron problem one is given $k+1$ closed convex sets in $\Real^d$ equipped with its Euclidean norm and asked to find the point in the last set such that the sum of the distances to the first $k$ sets is minimal. In later work the authors generalize the Heron problem even further, relax its convexity assumptions, study its theoretical properties, and pursue subgradient algorithms for solving the convex case. Here, we revisit the original problem solely from the numerical perspective. By exploiting the majorization-minimization (MM) principle of computational statistics and rudimentary techniques from differential calculus, we are able to construct a very fast algorithm for solving the Euclidean version of the generalized Heron problem.

preprint2012arXiv

Path Following in the Exact Penalty Method of Convex Programming

Classical penalty methods solve a sequence of unconstrained problems that put greater and greater stress on meeting the constraints. In the limit as the penalty constant tends to $\infty$, one recovers the constrained solution. In the exact penalty method, squared penalties are replaced by absolute value penalties, and the solution is recovered for a finite value of the penalty constant. In practice, the kinks in the penalty and the unknown magnitude of the penalty constant prevent wide application of the exact penalty method in nonlinear programming. In this article, we examine a strategy of path following consistent with the exact penalty method. Instead of performing optimization at a single penalty constant, we trace the solution as a continuous function of the penalty constant. Thus, path following starts at the unconstrained solution and follows the solution path as the penalty constant increases. In the process, the solution path hits, slides along, and exits from the various constraints. For quadratic programming, the solution path is piecewise linear and takes large jumps from constraint to constraint. For a general convex program, the solution path is piecewise smooth, and path following operates by numerically solving an ordinary differential equation segment by segment. Our diverse applications to a) projection onto a convex set, b) nonnegative least squares, c) quadratically constrained quadratic programming, d) geometric programming, and e) semidefinite programming illustrate the mechanics and potential of path following. The final detour to image denoising demonstrates the relevance of path following to regularized estimation in inverse problems. In regularized estimation, one follows the solution path as the penalty constant decreases from a large value.

preprint2012arXiv

Reconstructing DNA copy number by joint segmentation of multiple sequences

The variation in DNA copy number carries information on the modalities of genome evolution and misregulation of DNA replication in cancer cells; its study can be helpful to localize tumor suppressor genes, distinguish different populations of cancerous cell, as well identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand: this encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. We present an algorithm based on regularization approaches with significant computational advantages and competitive accuracy. We illustrate its applicability with simulated and real data sets.

preprint2011arXiv

A Path Algorithm for Constrained Estimation

Many least squares problems involve affine equality and inequality constraints. Although there are variety of methods for solving such problems, most statisticians find constrained estimation challenging. The current paper proposes a new path following algorithm for quadratic programming based on exact penalization. Similar penalties arise in $l_1$ regularization in model selection. Classical penalty methods solve a sequence of unconstrained problems that put greater and greater stress on meeting the constraints. In the limit as the penalty constant tends to $\infty$, one recovers the constrained solution. In the exact penalty method, squared penalties are replaced by absolute value penalties, and the solution is recovered for a finite value of the penalty constant. The exact path following method starts at the unconstrained solution and follows the solution path as the penalty constant increases. In the process, the solution path hits, slides along, and exits from the various constraints. Path following in lasso penalized regression, in contrast, starts with a large value of the penalty constant and works its way downward. In both settings, inspection of the entire solution path is revealing. Just as with the lasso and generalized lasso, it is possible to plot the effective degrees of freedom along the solution path. For a strictly convex quadratic program, the exact penalty algorithm can be framed entirely in terms of the sweep operator of regression analysis. A few well chosen examples illustrate the mechanics and potential of path following.

preprint2011arXiv

Multicategory vertex discriminant analysis for high-dimensional data

In response to the challenges of data mining, discriminant analysis continues to evolve as a vital branch of statistics. Our recently introduced method of vertex discriminant analysis (VDA) is ideally suited to handle multiple categories and an excess of predictors over training cases. The current paper explores an elaboration of VDA that conducts classification and variable selection simultaneously. Adding lasso ($\ell_1$-norm) and Euclidean penalties to the VDA loss function eliminates unnecessary predictors. Lasso penalties apply to each predictor coefficient separately; Euclidean penalties group the collective coefficients of a single predictor. With these penalties in place, cyclic coordinate descent accelerates estimation of all coefficients. Our tests on simulated and benchmark real data demonstrate the virtues of penalized VDA in model building and prediction in high-dimensional settings.

preprint2011arXiv

Reconstructing DNA copy number by penalized estimation and imputation

Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genotyping platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and Wang [Biostatistics 9 (2008) 18--29]. We mount a fresh attack on this difficult optimization problem by the following: (a) changing the penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) designing and implementing a new MM (majorization--minimization) algorithm, and (c) applying a fast version of Newton's method to jointly update all model parameters. Together these changes enable us to minimize the fused-lasso criterion in a highly effective way. We also reframe the reconstruction problem in terms of imputation via discrete optimization. This approach is easier and more accurate than parameter estimation because it relies on the fact that only a handful of possible copy number states exist at each SNP. The dynamic programming framework has the added bonus of exploiting information that the current fused-lasso approach ignores. The accuracy of our imputations is comparable to that of hidden Markov models at a substantially lower computational cost.

preprint2011arXiv

The MM Alternative to EM

The EM algorithm is a special case of a more general algorithm called the MM algorithm. Specific MM algorithms often have nothing to do with missing data. The first M step of an MM algorithm creates a surrogate function that is optimized in the second M step. In minimization, MM stands for majorize--minimize; in maximization, it stands for minorize--maximize. This two-step process always drives the objective function in the right direction. Construction of MM algorithms relies on recognizing and manipulating inequalities rather than calculating conditional expectations. This survey walks the reader through the construction of several specific MM algorithms. The potential of the MM algorithm in solving high-dimensional optimization and estimation problems is its most attractive feature. Our applications to random graph models, discriminant analysis and image restoration showcase this ability.

preprint2010arXiv

Graphics Processing Units and High-Dimensional Optimization

This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100 fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board.

preprint2010arXiv

MM Algorithms for Geometric and Signomial Programming

This paper derives new algorithms for signomial programming, a generalization of geometric programming. The algorithms are based on a generic principle for optimization called the MM algorithm. In this setting, one can apply the geometric-arithmetic mean inequality and a supporting hyperplane inequality to create a surrogate function with parameters separated. Thus, unconstrained signomial programming reduces to a sequence of one-dimensional minimization problems. Simple examples demonstrate that the MM algorithm derived can converge to a boundary point or to one point of a continuum of minimum points. Conditions under which the minimum point is unique or occurs in the interior of parameter space are proved for geometric programming. Convergence to an interior point occurs at a linear rate. Finally, the MM framework easily accommodates equality and inequality constraints of signomial type. For the most important special case, constrained quadratic programming, the MM algorithm involves very simple updates.

Kenneth Lange

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

A unified analysis of convex and non-convex lp-ball projection problems

Extensions to the Proximal Distance Method of Constrained Optimization

Feature Selection for Vertex Discriminant Analysis

Orthogonal Trace-Sum Maximization: Applications, Local Algorithms, and Global Optimality

MM Algorithms for Variance Components Models

The proximal distance algorithm

Fast Genome-Wide QTL Analysis Using Mendel

Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data

Splitting Methods for Convex Clustering

Distance Majorization and Its Applications

Stable Estimation of a Covariance Matrix Guided by Nuclear Norm Penalties

Techniques for Solving Sudoku Puzzles

A Look at the Generalized Heron Problem through the Lens of Majorization-Minimization

Path Following in the Exact Penalty Method of Convex Programming

Reconstructing DNA copy number by joint segmentation of multiple sequences

A Path Algorithm for Constrained Estimation

Multicategory vertex discriminant analysis for high-dimensional data

Reconstructing DNA copy number by penalized estimation and imputation

The MM Alternative to EM

Graphics Processing Units and High-Dimensional Optimization

MM Algorithms for Geometric and Signomial Programming