Source author record

Yangyang Xu

Yangyang Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC math.NA Machine Learning Numerical Analysis Computer Vision Distributed, Parallel, and Cluster Computing Data Structures and Algorithms Information Theory math.IT Artificial Intelligence Computation Computation and Language Computational Engineering, Finance, and Science eess.AS eess.IV math.CO math.FA Sound

Catalog footprint

What is connected

28works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement

Training-free camera control for pretrained flow-matching video generators is a partial-observation inverse problem: a depth-warped guidance video supplies noisy evidence on a subset of latent sites, which the sampler must reconcile with the pretrained prior. Existing methods struggle to balance the trade-off between trajectory adherence and visual quality and the heuristic guidance-strength tuning lacks robustness. We propose \textbf{$h$-control}, which resolves this dilemma through a structural change to the sampler: each outer hard-replacement guidance step is augmented with an inner-loop \emph{block-conditional pseudo-Gibbs refinement} on the unobserved complement at the same noise level, with provable convergence to the partial-observation conditional data law. To accelerate convergence on high-dimensional video latents, we exploit their conditional locality, partitioning the unobserved complement into 3D patches, each tracked by a custom mixing indicator that adaptively freezes converged patches. On RealEstate10K and DAVIS, \textbf{$h$-control} attains the best FVD against all seven training-free and training-based competitors, outperforming every training-free baseline on every reported metric.

preprint2022arXiv

Existence of $L^q$-dimension and entropy dimension of self-conformal measures on Riemannian manifolds

Peres and Solomyak proved that on $\mathbb R^n$, the limits defining the $L^q$-dimension for any $q\in(0,\infty)\setminus\{1\}$, and the entropy dimension of a self-conformal measure exist, without assuming any separation condition. By introducing the notions of heavy maximal packings and partitions, we prove that on a doubling metric space the $L^q$-dimension, $q\in(0,\infty)\setminus\{1\}$, is equivalent to the generalized dimension. We also generalize the result on the existence of the $L^q$-dimension to self-conformal measures on complete Riemannian manifolds with the doubling property. In particular, these results hold for complete Riemannian manifolds with nonnegative Ricci curvature. Moreover, by assuming that the measure is doubling, we extend the result on the existence of the entropy dimension to self-conformal measures on complete Riemannian manifolds.

preprint2022arXiv

High-resolution Face Swapping via Latent Semantics Disentanglement

We present a novel high-resolution face swapping method using the inherent prior knowledge of a pre-trained GAN model. Although previous research can leverage generative priors to produce high-resolution results, their quality can suffer from the entangled semantics of the latent space. We explicitly disentangle the latent semantics by utilizing the progressive nature of the generator, deriving structure attributes from the shallow layers and appearance attributes from the deeper ones. Identity and pose information within the structure attributes are further separated by introducing a landmark-driven structure transfer latent direction. The disentangled latent code produces rich generative features that incorporate feature blending to produce a plausible swapping result. We further extend our method to video face swapping by enforcing two spatio-temporal constraints on the latent space and the image space. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art image/video face swapping methods in terms of hallucination quality and consistency. Code can be found at: https://github.com/cnnlstm/FSLSD_HiRes.

preprint2022arXiv

Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilizing different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic, visual, text and biological features. These features are fused by TEMMA and GRU with self-attention mechanism frameworks. In this paper, 1) several new audio features, facial expression features and paragraph-level text embeddings are extracted for accuracy improvement. 2) we substantially improve the accuracy and reliability of multimodal sentiment prediction by mining and blending the multimodal features. 3) effective data augmentation strategies are applied in model training to alleviate the problem of sample imbalance and prevent the model from learning biased subject characters. For the MuSe-Humor sub-challenge, our model obtains the AUC score of 0.8932. For the MuSe-Reaction sub-challenge, the Pearson's Correlations Coefficient of our approach on the test set is 0.3879, which outperforms all other participants. For the MuSe-Stress sub-challenge, our approach outperforms the baseline in both arousal and valence on the test dataset, reaching a final combined result of 0.5151.

preprint2022arXiv

Inexact accelerated proximal gradient method with line search and reduced complexity for affine-constrained and bilinear saddle-point structured convex problems

The goal of this paper is to reduce the total complexity of gradient-based methods for two classes of problems: affine-constrained composite convex optimization and bilinear saddle-point structured non-smooth convex optimization. Our technique is based on a double-loop inexact accelerated proximal gradient (APG) method for minimizing the summation of a non-smooth but proximable convex function and two smooth convex functions with different smoothness constants and computational costs. Compared to the standard APG method, the inexact APG method can reduce the total computation cost if one smooth component has higher computational cost but a smaller smoothness constant than the other. With this property, the inexact APG method can be applied to approximately solve the subproblems of a proximal augmented Lagrangian method for affine-constrained composite convex optimization and the smooth approximation for bilinear saddle-point structured non-smooth convex optimization, where the smooth function with a smaller smoothness constant has significantly higher computational cost. Thus it can reduce total complexity for finding an approximately optimal/stationary solution. This technique is similar to the gradient sliding technique in the literature. The difference is that our inexact APG method can efficiently stop the inner loop by using a computable condition based on a measure of stationarity violation, while the gradient sliding methods need to pre-specify the number of iterations for the inner loop. Numerical experiments demonstrate significantly higher efficiency of our methods over an optimal primal-dual first-order method and the gradient sliding methods.

preprint2022arXiv

Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(\varepsilon^{-3})$ to produce a stochastic $\varepsilon$-stationary solution, if a mean-squared smoothness condition holds. Different from existing optimal methods, PStorm can achieve the ${O}(\varepsilon^{-3})$ result by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.

preprint2022arXiv

Parallel and distributed asynchronous adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the non-adaptive versions while incurring little overhead. On the other hand, asynchronous (async) parallel computing has exhibited significantly higher speed-up over its synchronous (sync) counterpart. Async-parallel non-adaptive SGMs have been well studied in the literature from the perspectives of both theory and practical performance. Adaptive SGMs can also be implemented without much difficulty in an async-parallel way. However, to the best of our knowledge, no theoretical result of async-parallel adaptive SGMs has been established. The difficulty for analyzing adaptive SGMs with async updates originates from the second moment term. In this paper, we propose an async-parallel adaptive SGM based on AMSGrad. We show that the proposed method inherits the convergence guarantee of AMSGrad for both convex and non-convex problems, if the staleness (also called delay) caused by asynchrony is bounded. Our convergence rate results indicate a nearly linear parallelization speed-up if $τ=o(K^{\frac{1}{4}})$, where $τ$ is the staleness and $K$ is the number of iterations. The proposed method is tested on both convex and non-convex machine learning problems, and the numerical results demonstrate its clear advantages over the sync counterpart and the async-parallel nonadaptive SGM.

preprint2021arXiv

Adaptive Primal-Dual Stochastic Gradient Method for Expectation-constrained Convex Stochastic Programs

Stochastic gradient methods (SGMs) have been widely used for solving stochastic optimization problems. A majority of existing works assume no constraints or easy-to-project constraints. In this paper, we consider convex stochastic optimization problems with expectation constraints. For these problems, it is often extremely expensive to perform projection onto the feasible set. Several SGMs in the literature can be applied to solve the expectation-constrained stochastic problems. We propose a novel primal-dual type SGM based on the Lagrangian function. Different from existing methods, our method incorporates an adaptiveness technique to speed up convergence. At each iteration, our method inquires an unbiased stochastic subgradient of the Lagrangian function, and then it renews the primal variables by an adaptive-SGM update and the dual variables by a vanilla-SGM update. We show that the proposed method has a convergence rate of $O(1/\sqrt{k})$ in terms of the objective error and the constraint violation. Although the convergence rate is the same as those of existing SGMs, we observe its significantly faster convergence than an existing non-adaptive primal-dual SGM and a primal SGM on solving the Neyman-Pearson classification and quadratically constrained quadratic programs. Furthermore, we modify the proposed method to solve convex-concave stochastic minimax problems, for which we perform adaptive-SGM updates to both primal and dual variables. A convergence rate of $O(1/\sqrt{k})$ is also established to the modified method for solving minimax problems in terms of primal-dual gap.

preprint2021arXiv

Augmented Lagrangian based first-order methods for convex-constrained programs with weakly-convex objective

First-order methods (FOMs) have been widely used for solving large-scale problems. A majority of existing works focus on problems without constraint or with simple constraints. Several recent works have studied FOMs for problems with complicated functional constraints. In this paper, we design a novel augmented Lagrangian (AL) based FOM for solving problems with non-convex objective and convex constraint functions. The new method follows the framework of the proximal point (PP) method. On approximately solving PP subproblems, it mixes the usage of the inexact AL method (iALM) and the quadratic penalty method, while the latter is always fed with estimated multipliers by the iALM. We show a complexity result of $O(\varepsilon^{-\frac{5}{2}}|\log\varepsilon|)$ for the proposed method to achieve an $\varepsilon$-KKT point. This is the best known result. Theoretically, the hybrid method has lower iteration-complexity requirement than its counterpart that only uses iALM to solve PP subproblems, and numerically, it can perform significantly better than a pure-penalty-based method. Numerical experiments are conducted on nonconvex linearly constrained quadratic programs and nonconvex QCQP. The numerical results demonstrate the efficiency of the proposed methods over existing ones.

preprint2021arXiv

Deep Texture-Aware Features for Camouflaged Object Detection

Camouflaged object detection is a challenging task that aims to identify objects having similar texture to the surroundings. This paper presents to amplify the subtle texture difference between camouflaged objects and the background for camouflaged object detection by formulating multiple texture-aware refinement modules to learn the texture-aware features in a deep convolutional neural network. The texture-aware refinement module computes the covariance matrices of feature responses to extract the texture information, designs an affinity loss to learn a set of parameter maps that help to separate the texture between camouflaged objects and the background, and adopts a boundary-consistency loss to explore the object detail structures.We evaluate our network on the benchmark dataset for camouflaged object detection both qualitatively and quantitatively. Experimental results show that our approach outperforms various state-of-the-art methods by a large margin.

preprint2017arXiv

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.

preprint2016arXiv

Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming

Motivated by big data applications, first-order methods have been extremely popular in recent years. However, naive gradient methods generally converge slowly. Hence, much efforts have been made to accelerate various first-order methods. This paper proposes two accelerated methods towards solving structured linearly constrained convex programming, for which we assume composite convex objective. The first method is the accelerated linearized augmented Lagrangian method (LALM). At each update to the primal variable, it allows linearization to the differentiable function and also the augmented term, and thus it enables easy subproblems. Assuming merely weak convexity, we show that LALM owns $O(1/t)$ convergence if parameters are kept fixed during all the iterations and can be accelerated to $O(1/t^2)$ if the parameters are adapted, where $t$ is the number of total iterations. The second method is the accelerated linearized alternating direction method of multipliers (LADMM). In addition to the composite convexity, it further assumes two-block structure on the objective. Different from classic ADMM, our method allows linearization to the objective and also augmented term to make the update simple. Assuming strong convexity on one block variable, we show that LADMM also enjoys $O(1/t^2)$ convergence with adaptive parameters. This result is a significant improvement over that in [Goldstein et. al, SIIMS'14], which requires strong convexity on both block variables and no linearization to the objective or augmented term. Numerical experiments are performed on quadratic programming, image denoising, and support vector machine. The proposed accelerated methods are compared to nonaccelerated ones and also existing accelerated methods. The results demonstrate the validness of acceleration and superior performance of the proposed methods over existing ones.

preprint2016arXiv

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

Finding a fixed point to a nonexpansive operator, i.e., $x^*=Tx^*$, abstracts many problems in numerical linear algebra, optimization, and other areas of scientific computing. To solve fixed-point problems, we propose ARock, an algorithmic framework in which multiple agents (machines, processors, or cores) update $x$ in an asynchronous parallel fashion. Asynchrony is crucial to parallel computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate $x_i$ based on possibly out-of-date information on $x$. The agents share $x$ through either global memory or communication. If writing $x_i$ is atomic, the agents can read and write $x$ without memory locks. Theoretically, we show that if the nonexpansive operator $T$ has a fixed point, then with probability one, ARock generates a sequence that converges to a fixed points of $T$. Our conditions on $T$ and step sizes are weaker than comparable work. Linear convergence is also obtained. We propose special cases of ARock for linear systems, convex optimization, machine learning, as well as distributed and decentralized consensus problems. Numerical experiments of solving sparse logistic regression problems are presented.

preprint2016arXiv

Coordinate Friendly Structures, Algorithms and Applications

This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear mappings, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize. The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates. Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.

preprint2016arXiv

Fast algorithms for Higher-order Singular Value Decomposition from incomplete data

Higher-order singular value decomposition (HOSVD) is an efficient way for data reduction and also eliciting intrinsic structure of multi-dimensional array data. It has been used in many applications, and some of them involve incomplete data. To obtain HOSVD of the data with missing values, one can first impute the missing entries through a certain tensor completion method and then perform HOSVD to the reconstructed data. However, the two-step procedure can be inefficient and does not make reliable decomposition. In this paper, we formulate an incomplete HOSVD problem and combine the two steps into solving a single optimization problem, which simultaneously achieves imputation of missing values and also tensor decomposition. We also present two algorithms for solving the problem based on block coordinate update. Global convergence of both algorithms is shown under mild assumptions. The convergence of the second algorithm implies that of the popular higher-order orthogonality iteration (HOOI) method, and thus we, for the first time, give global convergence of HOOI. In addition, we compare the proposed methods to state-of-the-art ones for solving incomplete HOSVD and also low-rank tensor completion problems and demonstrate the superior performance of our methods over other compared ones. Furthermore, we apply them to face recognition and MRI image reconstruction to show their practical performance.

preprint2015arXiv

A globally convergent algorithm for nonconvex optimization based on block coordinate update

Nonconvex optimization problems arise in many areas of computational science and engineering and are (approximately) solved by a variety of algorithms. Existing algorithms usually only have local convergence or subsequence convergence of their iterates. We propose an algorithm for a generic nonconvex optimization formulation, establish the convergence of its whole iterate sequence to a critical point along with a rate of convergence, and numerically demonstrate its efficiency. Specially, we consider the problem of minimizing a nonconvex objective function. Its variables can be treated as one block or be partitioned into multiple disjoint blocks. It is assumed that each non-differentiable component of the objective function or each constraint applies to one block of variables. The differentiable components of the objective function, however, can apply to one or multiple blocks of variables together. Our algorithm updates one block of variables at time by minimizing a certain prox-linear surrogate. The order of update can be either deterministic or randomly shuffled in each round. We obtain the convergence of the whole iterate sequence under fairly loose conditions including, in particular, the Kurdyka-Łojasiewicz (KL) condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. We apply our convergence result to the coordinate descent method for non-convex regularized linear regression and also a modified rank-one residue iteration method for nonnegative matrix factorization. We show that both the methods have global convergence. Numerically, we test our algorithm on nonnegative matrix and tensor factorization problems, with random shuffling enable to avoid local solutions.

preprint2015arXiv

ADMM for the SDP relaxation of the QAP

The semidefinite programming (SDP) relaxation has proven to be extremely strong for many hard discrete optimization problems. This is in particular true for the quadratic assignment problem (QAP), arguably one of the hardest NP-hard discrete optimization problems. There are several difficulties that arise in efficiently solving the SDP relaxation, e.g.,~increased dimension; inefficiency of the current primal-dual interior point solvers in terms of both time and accuracy; and difficulty and high expense in adding cutting plane constraints. We propose using the alternating direction method of multipliers (ADMM) to solve the SDP relaxation. This first order approach allows for inexpensive iterations, a method of cheaply obtaining low rank solutions, as well a trivial way of adding cutting plane inequalities. When compared to current approaches and current best available bounds we obtain remarkable robustness, efficiency and improved bounds.

preprint2015arXiv

Alternating direction method of multipliers for regularized multiclass support vector machines

The support vector machine (SVM) was originally designed for binary classifications. A lot of effort has been put to generalize the binary SVM to multiclass SVM (MSVM) which are more complex problems. Initially, MSVMs were solved by considering their dual formulations which are quadratic programs and can be solved by standard second-order methods. However, the duals of MSVMs with regularizers are usually more difficult to formulate and computationally very expensive to solve. This paper focuses on several regularized MSVMs and extends the alternating direction method of multiplier (ADMM) to these MSVMs. Using a splitting technique, all considered MSVMs are written as two-block convex programs, for which the ADMM has global convergence guarantees. Numerical experiments on synthetic and real data demonstrate the high efficiency and accuracy of our algorithms.

preprint2015arXiv

Block stochastic gradient iteration for convex and nonconvex optimization

The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions, or solve a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks of variables are easier to update individually than together, BCD has a lower per-iteration cost. This paper introduces a method that combines the features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. Specifically, a block stochastic gradient (BSG) method is proposed for solving both convex and nonconvex programs. At each iteration, BSG approximates the gradient of the differentiable part of the objective by randomly sampling a small set of data or sampling a few functions from the sum term in the objective, and then, using those samples, it updates all the blocks of variables in either a deterministic or a randomly shuffled order. Its convergence for both convex and nonconvex cases are established in different senses. In the convex case, the proposed method has the same order of convergence rate as the SG method. In the nonconvex case, its convergence is established in terms of the expected violation of a first-order optimality condition. The proposed method was numerically tested on problems including stochastic least squares and logistic regression, which are convex, as well as low-rank tensor recovery and bilinear logistic regression, which are nonconvex.

preprint2015arXiv

Global and Local Structure Preserving Sparse Subspace Learning: An Iterative Approach to Unsupervised Feature Selection

As we aim at alleviating the curse of high-dimensionality, subspace learning is becoming more popular. Existing approaches use either information about global or local structure of the data, and few studies simultaneously focus on global and local structures as the both of them contain important information. In this paper, we propose a global and local structure preserving sparse subspace learning (GLoSS) model for unsupervised feature selection. The model can simultaneously realize feature selection and subspace learning. In addition, we develop a greedy algorithm to establish a generic combinatorial model, and an iterative strategy based on an accelerated block coordinate descent is used to solve the GLoSS problem. We also provide whole iterate sequence convergence analysis of the proposed iterative algorithm. Extensive experiments are conducted on real-world datasets to show the superiority of the proposed approach over several state-of-the-art unsupervised feature selection approaches.

preprint2015arXiv

Parallel matrix factorization for low-rank tensor completion

Higher-order low-rank tensors naturally arise in many applications including hyperspectral data recovery, video inpainting, seismic data recon- struction, and so on. We propose a new model to recover a low-rank tensor by simultaneously performing low-rank matrix factorizations to the all-mode ma- tricizations of the underlying tensor. An alternating minimization algorithm is applied to solve the model, along with two adaptive rank-adjusting strategies when the exact rank is not known. Phase transition plots reveal that our algorithm can recover a variety of synthetic low-rank tensors from significantly fewer samples than the compared methods, which include a matrix completion method applied to tensor recovery and two state-of-the-art tensor completion methods. Further tests on real- world data show similar advantages. Although our model is non-convex, our algorithm performs consistently throughout the tests and give better results than the compared methods, some of which are based on convex models. In addition, the global convergence of our algorithm can be established in the sense that the gradient of Lagrangian function converges to zero.

preprint2015arXiv

Proximal gradient method for huberized support vector machine

The Support Vector Machine (SVM) has been used in a wide variety of classification problems. The original SVM uses the hinge loss function, which is non-differentiable and makes the problem difficult to solve in particular for regularized SVMs, such as with $\ell_1$-regularization. This paper considers the Huberized SVM (HSVM), which uses a differentiable approximation of the hinge loss function. We first explore the use of the Proximal Gradient (PG) method to solving binary-class HSVM (B-HSVM) and then generalize it to multi-class HSVM (M-HSVM). Under strong convexity assumptions, we show that our algorithm converges linearly. In addition, we give a finite convergence result about the support of the solution, based on which we further accelerate the algorithm by a two-stage method. We present extensive numerical experiments on both synthetic and real datasets which demonstrate the superiority of our methods over some state-of-the-art methods for both binary- and multi-class SVMs.

preprint2014arXiv

A fast patch-dictionary method for whole image recovery

Various algorithms have been proposed for dictionary learning. Among those for image processing, many use image patches to form dictionaries. This paper focuses on whole-image recovery from corrupted linear measurements. We address the open issue of representing an image by overlapping patches: the overlapping leads to an excessive number of dictionary coefficients to determine. With very few exceptions, this issue has limited the applications of image-patch methods to the local kind of tasks such as denoising, inpainting, cartoon-texture decomposition, super-resolution, and image deblurring, for which one can process a few patches at a time. Our focus is global imaging tasks such as compressive sensing and medical image recovery, where the whole image is encoded together, making it either impossible or very ineffective to update a few patches at a time. Our strategy is to divide the sparse recovery into multiple subproblems, each of which handles a subset of non-overlapping patches, and then the results of the subproblems are averaged to yield the final recovery. This simple strategy is surprisingly effective in terms of both quality and speed. In addition, we accelerate computation of the learned dictionary by applying a recent block proximal-gradient method, which not only has a lower per-iteration complexity but also takes fewer iterations to converge, compared to the current state-of-the-art. We also establish that our algorithm globally converges to a stationary point. Numerical results on synthetic data demonstrate that our algorithm can recover a more faithful dictionary than two state-of-the-art methods. Combining our whole-image recovery and dictionary-learning methods, we numerically simulate image inpainting, compressive sensing recovery, and deblurring. Our recovery is more faithful than those of a total variation method and a method based on overlapping patches.

preprint2014arXiv

Alternating proximal gradient method for sparse nonnegative Tucker decomposition

Multi-way data arises in many applications such as electroencephalography (EEG) classification, face recognition, text mining and hyperspectral data analysis. Tensor decomposition has been commonly used to find the hidden factors and elicit the intrinsic structures of the multi-way data. This paper considers sparse nonnegative Tucker decomposition (NTD), which is to decompose a given tensor into the product of a core tensor and several factor matrices with sparsity and nonnegativity constraints. An alternating proximal gradient method (APG) is applied to solve the problem. The algorithm is then modified to sparse NTD with missing values. Per-iteration cost of the algorithm is estimated scalable about the data size, and global convergence is established under fairly loose conditions. Numerical experiments on both synthetic and real world data demonstrate its superiority over a few state-of-the-art methods for (sparse) NTD from partial and/or full observations. The MATLAB code along with demos are accessible from the author's homepage.

preprint2014arXiv

Sparse Bilinear Logistic Regression

In this paper, we introduce the concept of sparse bilinear logistic regression for decision problems involving explanatory variables that are two-dimensional matrices. Such problems are common in computer vision, brain-computer interfaces, style/content factorization, and parallel factor analysis. The underlying optimization problem is bi-convex; we study its solution and develop an efficient algorithm based on block coordinate descent. We provide a theoretical guarantee for global convergence and estimate the asymptotical convergence rate using the Kurdyka-Łojasiewicz inequality. A range of experiments with simulated and real data demonstrate that sparse bilinear logistic regression outperforms current techniques in several important applications.

preprint2013arXiv

Alternating proximal gradient method for nonnegative matrix factorization

Nonnegative matrix factorization has been widely applied in face recognition, text mining, as well as spectral analysis. This paper proposes an alternating proximal gradient method for solving this problem. With a uniformly positive lower bound assumption on the iterates, any limit point can be proved to satisfy the first-order optimality conditions. A Nesterov-type extrapolation technique is then applied to accelerate the algorithm. Though this technique is at first used for convex program, it turns out to work very well for the non-convex nonnegative matrix factorization problem. Extensive numerical experiments illustrate the efficiency of the alternating proximal gradient method and the accleration technique. Especially for real data tests, the accelerated method reveals high superiority to state-of-the-art algorithms in speed with comparable solution qualities.

preprint2011arXiv

A variant of multitask n-vehicle exploration problem: maximizing every processor's average profit

We discuss a variant of multitask n-vehicle exploration problem. Instead of requiring an optimal permutation of vehicles in every group, the new problem asks all vehicles in a group to arrive at a same destination. It can also be viewed as to maximize every processor's average profit, given n tasks, and each task's consume-time and profit. Meanwhile, we propose a new kind of partition problem in fractional form, and analyze its computational complexity. Moreover, by regarding fractional partition as a special case, we prove that the maximizing average profit problem is NP-hard when the number of processors is fixed and it is strongly NP-hard in general. At last, a pseudo-polynomial time algorithm for the maximizing average profit problem and the fractional partition problem is presented, thanks to the idea of the pseudo-polynomial time algorithm for the classical partition problem.

preprint2011arXiv

An Alternating Direction Algorithm for Matrix Completion with Nonnegative Factors

This paper introduces an algorithm for the nonnegative matrix factorization-and-completion problem, which aims to find nonnegative low-rank matrices X and Y so that the product XY approximates a nonnegative data matrix M whose elements are partially known (to a certain accuracy). This problem aggregates two existing problems: (i) nonnegative matrix factorization where all entries of M are given, and (ii) low-rank matrix completion where nonnegativity is not required. By taking the advantages of both nonnegativity and low-rankness, one can generally obtain superior results than those of just using one of the two properties. We propose to solve the non-convex constrained least-squares problem using an algorithm based on the classic alternating direction augmented Lagrangian method. Preliminary convergence properties of the algorithm and numerical simulation results are presented. Compared to a recent algorithm for nonnegative matrix factorization, the proposed algorithm produces factorizations of similar quality using only about half of the matrix entries. On tasks of recovering incomplete grayscale and hyperspectral images, the proposed algorithm yields overall better qualities than those produced by two recent matrix-completion algorithms that do not exploit nonnegativity.

Yangyang Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

28 published item(s)

$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement

Existence of $L^q$-dimension and entropy dimension of self-conformal measures on Riemannian manifolds

High-resolution Face Swapping via Latent Semantics Disentanglement

Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

Inexact accelerated proximal gradient method with line search and reduced complexity for affine-constrained and bilinear saddle-point structured convex problems

Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Parallel and distributed asynchronous adaptive stochastic gradient methods

Adaptive Primal-Dual Stochastic Gradient Method for Expectation-constrained Convex Stochastic Programs

Augmented Lagrangian based first-order methods for convex-constrained programs with weakly-convex objective

Deep Texture-Aware Features for Camouflaged Object Detection

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

Coordinate Friendly Structures, Algorithms and Applications

Fast algorithms for Higher-order Singular Value Decomposition from incomplete data

A globally convergent algorithm for nonconvex optimization based on block coordinate update

ADMM for the SDP relaxation of the QAP

Alternating direction method of multipliers for regularized multiclass support vector machines

Block stochastic gradient iteration for convex and nonconvex optimization

Global and Local Structure Preserving Sparse Subspace Learning: An Iterative Approach to Unsupervised Feature Selection

Parallel matrix factorization for low-rank tensor completion

Proximal gradient method for huberized support vector machine

A fast patch-dictionary method for whole image recovery

Alternating proximal gradient method for sparse nonnegative Tucker decomposition

Sparse Bilinear Logistic Regression

Alternating proximal gradient method for nonnegative matrix factorization

A variant of multitask n-vehicle exploration problem: maximizing every processor's average profit

An Alternating Direction Algorithm for Matrix Completion with Nonnegative Factors