Source author record

Zongben Xu

Zongben Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision math.OC Information Theory math.IT Neural and Evolutionary Computing Numerical Analysis Artificial Intelligence Data Structures and Algorithms Databases eess.IV eess.SP math.NA

Catalog footprint

What is connected

42works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration

Image restoration is an inherently ill posed inverse problem. Equivariant networks that embed geometric symmetry priors can mitigate this ill posedness and improve performance. However, current understanding of the relationship between network equivariance and data symmetry remains largely heuristic. Particularly for real world data with imperfect symmetry, existing research lacks a systematic theoretical framework to quantify symmetry, select transformation groups, or evaluate model data alignment. To bridge this gap, we conduct an analysis from an optimization perspective and formalize the intrinsic relationship among data symmetry priors, model equivariance, and generalization capability. Specifically, we propose for the first time a quantifiable definition of non strict symmetry at the dataset level (rather than sample level) and use it as a constraint to formulate the restoration inverse problem. We then show that the equivariance for restoration models can be naturally derived from this inverse problems incorporated the proposed symmetry constraints, and that the equivariance error of the optimal restoration operator is strictly bounded by the data symmetry error and the discretization mesh size. Furthermore, by analyzing the network's empirical risk, we demonstrate that aligning equivariance with data symmetry optimizes the bias variance trade off, minimizing the total expected risk. Guided by these insights, we propose a Sample Adaptive Equivariant Network that uses a hypernetwork and transformation learnable equivariant convolutions to dynamically align with each sample's inherent symmetry. Extensive experiments on super resolution, denoising, and deraining validate our theoretical findings and show significant superiority over standard baselines and traditional equivariant models. Our code and supplementary material are available at https://github.com/tanfy929/SA-Conv.

preprint2022arXiv

LDP-IDS: Local Differential Privacy for Infinite Data Streams

Streaming data collection is essential to real-time data analytics in various IoTs and mobile device-based systems, which, however, may expose end users' privacy. Local differential privacy (LDP) is a promising solution to privacy-preserving data collection and analysis. However, existing few LDP studies over streams are either applicable to finite streams only or suffering from insufficient protection. This paper investigates this problem by proposing LDP-IDS, a novel $w$-event LDP paradigm to provide practical privacy guarantee for infinite streams at users end, and adapting the popular budget division framework in centralized differential privacy (CDP). By constructing a unified error analysi for LDP, we first develop two adatpive budget division-based LDP methods for LDP-IDS that can enhance data utility via leveraging the non-deterministic sparsity in streams. Beyond that, we further propose a novel population division framework that can not only avoid the high sensitivity of LDP noise to budget division but also require significantly less communication. Based on the framework, we also present two adaptive population division methods for LDP-IDS with theoretical analysis. We conduct extensive experiments on synthetic and real-world datasets to evaluate the effectiveness and efficiency pf our proposed frameworks and methods. Experimental results demonstrate that, despite the effectiveness of the adaptive budget division methods, the proposed population division framework and methods can further achieve much higher effectiveness and efficiency.

preprint2022arXiv

Robust spectral compressive sensing via vanilla gradient descent

This paper investigates the recovery of a spectrally sparse signal from its partially revealed noisy entries within the framework of spectral compressive sensing. Nonconvex optimization approaches have recently been proposed based on low-rank Hankel matrix completion and projected gradient descent (PGD). The PGD however involves unknown tuning parameters and its theoretical analysis is available only in the absence of noise. In this paper, we propose a hyperparameter-free, vanilla gradient descent (VGD) algorithm and prove that the VGD enables robust recovery of an $N$-dimensional $K$-spectrally-sparse signal from order $K^2 log^2N$ number of noisy samples under coherence and other mild conditions. The above sample complexity increases by factor $logN$ as compared with PGD without noise. Numerical simulations are provided that corroborate our analysis and show advantageous performances of VGD.

preprint2021arXiv

Graph Neural Network Encoding for Community Detection in Attribute Networks

In this paper, we first propose a graph neural network encoding method for multiobjective evolutionary algorithm to handle the community detection problem in complex attribute networks. In the graph neural network encoding method, each edge in an attribute network is associated with a continuous variable. Through non-linear transformation, a continuous valued vector (i.e. a concatenation of the continuous variables associated with the edges) is transferred to a discrete valued community grouping solution. Further, two objective functions for single- and multi-attribute network are proposed to evaluate the attribute homogeneity of the nodes in communities, respectively. Based on the new encoding method and the two objectives, a multiobjective evolutionary algorithm (MOEA) based upon NSGA-II, termed as continuous encoding MOEA, is developed for the transformed community detection problem with continuous decision variables. Experimental results on single- and multi-attribute networks with different types show that the developed algorithm performs significantly better than some well-known evolutionary and non-evolutionary based algorithms. The fitness landscape analysis verifies that the transformed community detection problems have smoother landscapes than those of the original problems, which justifies the effectiveness of the proposed graph neural network encoding method.

preprint2021arXiv

Learning adaptive differential evolution algorithm from optimization experiences by policy gradient

Differential evolution is one of the most prestigious population-based stochastic optimization algorithm for black-box problems. The performance of a differential evolution algorithm depends highly on its mutation and crossover strategy and associated control parameters. However, the determination process for the most suitable parameter setting is troublesome and time-consuming. Adaptive control parameter methods that can adapt to problem landscape and optimization environment are more preferable than fixed parameter settings. This paper proposes a novel adaptive parameter control approach based on learning from the optimization experiences over a set of problems. In the approach, the parameter control is modeled as a finite-horizon Markov decision process. A reinforcement learning algorithm, named policy gradient, is applied to learn an agent (i.e. parameter controller) that can provide the control parameters of a proposed differential evolution adaptively during the search procedure. The differential evolution algorithm based on the learned agent is compared against nine well-known evolutionary algorithms on the CEC'13 and CEC'17 test suites. Experimental results show that the proposed algorithm performs competitively against these compared algorithms on the test suites.

preprint2020arXiv

Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution

The recent advancement of deep learning techniques has made great progress on hyperspectral image super-resolution (HSI-SR). Yet the development of unsupervised deep networks remains challenging for this task. To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution multispectral image (MSI). Inspired by coupled spectral unmixing, a two-stream convolutional autoencoder framework is taken as backbone to jointly decompose MS and HS data into a spectrally meaningful basis and corresponding coefficients. CUCaNet is capable of adaptively learning spectral and spatial response functions from HS-MS correspondences by enforcing reasonable consistency assumptions on the networks. Moreover, a cross-attention module is devised to yield more effective spatial-spectral information transfer in networks. Extensive experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models, demonstrating the superiority of the CUCaNet in the HSI-SR application. Furthermore, the codes and datasets will be available at: https://github.com/danfenghong/ECCV2020_CUCaNet.

preprint2020arXiv

Learning Adaptive Loss for Robust Learning with Noisy Labels

Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust loss functions, however, inevitably involve hyperparameter(s) to be tuned, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Besides, the non-convexity brought by the loss as well as the complicated network architecture makes it easily trapped into an unexpected solution with poor generalization capability. To address above issues, we propose a meta-learning method capable of adaptively learning hyperparameter in robust loss functions. Specifically, through mutual amelioration between robust loss hyperparameter and network parameters in our method, both of them can be simultaneously finely learned and coordinated to attain solutions with good generalization capability. Four kinds of SOTA robust loss functions are attempted to be integrated into our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its accuracy and generalization performance, as compared with conventional hyperparameter tuning strategy, even with carefully tuned hyperparameters.

preprint2020arXiv

Learning to be Global Optimizer

The advancement of artificial intelligence has cast a new light on the development of optimization algorithm. This paper proposes to learn a two-phase (including a minimization phase and an escaping phase) global optimization algorithm for smooth non-convex functions. For the minimization phase, a model-driven deep learning method is developed to learn the update rule of descent direction, which is formalized as a nonlinear combination of historical information, for convex functions. We prove that the resultant algorithm with the proposed adaptive direction guarantees convergence for convex functions. Empirical study shows that the learned algorithm significantly outperforms some well-known classical optimization algorithms, such as gradient descent, conjugate descent and BFGS, and performs well on ill-posed functions. The escaping phase from local optimum is modeled as a Markov decision process with a fixed escaping policy. We further propose to learn an optimal escaping policy by reinforcement learning. The effectiveness of the escaping policies is verified by optimizing synthesized functions and training a deep neural network for CIFAR image classification. The learned two-phase global optimization algorithm demonstrates a promising global search capability on some benchmark functions and machine learning tasks.

preprint2020arXiv

Learning to Search for MIMO Detection

This paper proposes a novel learning to learn method, called learning to learn iterative search algorithm (LISA), for signal detection in a multi-input multi-output (MIMO) system. The idea is to regard the signal detection problem as a decision making problem over tree. The goal is to learn the optimal decision policy. In LISA, deep neural networks are used as parameterized policy function. Through training, optimal parameters of the neural networks are learned and thus optimal policy can be approximated. Different neural network based architectures are used for fixed and varying channel models, respectively. LISA provides soft decisions and does not require any information about the additive white Gaussian noise. Simulation results show that LISA 1) obtains near maximum likelihood detection performance in both fixed and varying channel models under QPSK modulation; 2) achieves significantly better bit error rate (BER) performance than classical detectors and recently proposed deep/machine learning based detectors at various modulations and signal to noise (SNR) ratios both under i.i.d and correlated Rayleigh fading channels in the simulation experiments; 3) is robust to MIMO detection problems with imperfect channel state information; and 4) generalizes very well against channel correlation and SNRs.

preprint2020arXiv

Meta Transition Adaptation for Robust Deep Learning with Noisy Labels

To discover intrinsic inter-class transition probabilities underlying data, learning with noise transition has become an important approach for robust deep learning on corrupted labels. Prior methods attempt to achieve such transition knowledge by pre-assuming strongly confident anchor points with 1-probability belonging to a specific class, generally infeasible in practice, or directly jointly estimating the transition matrix and learning the classifier from the noisy samples, always leading to inaccurate estimation misguided by wrong annotation information especially in large noise cases. To alleviate these issues, this study proposes a new meta-transition-learning strategy for the task. Specifically, through the sound guidance of a small set of meta data with clean labels, the noise transition matrix and the classifier parameters can be mutually ameliorated to avoid being trapped by noisy training samples, and without need of any anchor point assumptions. Besides, we prove our method is with statistical consistency guarantee on correctly estimating the desired transition matrix. Extensive synthetic and real experiments validate that our method can more accurately extract the transition matrix, naturally following its more robust performance than prior arts. Its essential relationship with label distribution learning is also discussed, which explains its fine performance even under no-noise scenarios.

preprint2020arXiv

On Hyper-parameter Tuning for Stochastic Optimization Algorithms

This paper proposes the first-ever algorithmic framework for tuning hyper-parameters of stochastic optimization algorithm based on reinforcement learning. Hyper-parameters impose significant influences on the performance of stochastic optimization algorithms, such as evolutionary algorithms (EAs) and meta-heuristics. Yet, it is very time-consuming to determine optimal hyper-parameters due to the stochastic nature of these algorithms. We propose to model the tuning procedure as a Markov decision process, and resort the policy gradient algorithm to tune the hyper-parameters. Experiments on tuning stochastic algorithms with different kinds of hyper-parameters (continuous and discrete) for different optimization problems (continuous and discrete) show that the proposed hyper-parameter tuning algorithms do not require much less running times of the stochastic algorithms than bayesian optimization method. The proposed framework can be used as a standard tool for hyper-parameter tuning in stochastic algorithms.

preprint2020arXiv

Polarimetric SAR Image Semantic Segmentation with 3D Discrete Wavelet Transform and Markov Random Field

Polarimetric synthetic aperture radar (PolSAR) image segmentation is currently of great importance in image processing for remote sensing applications. However, it is a challenging task due to two main reasons. Firstly, the label information is difficult to acquire due to high annotation costs. Secondly, the speckle effect embedded in the PolSAR imaging process remarkably degrades the segmentation performance. To address these two issues, we present a contextual PolSAR image semantic segmentation method in this paper.With a newly defined channelwise consistent feature set as input, the three-dimensional discrete wavelet transform (3D-DWT) technique is employed to extract discriminative multi-scale features that are robust to speckle noise. Then Markov random field (MRF) is further applied to enforce label smoothness spatially during segmentation. By simultaneously utilizing 3D-DWT features and MRF priors for the first time, contextual information is fully integrated during the segmentation to ensure accurate and smooth segmentation. To demonstrate the effectiveness of the proposed method, we conduct extensive experiments on three real benchmark PolSAR image data sets. Experimental results indicate that the proposed method achieves promising segmentation accuracy and preferable spatial consistency using a minimal number of labeled pixels.

preprint2020arXiv

Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction

Current machine learning has made great progress on computer vision and many other fields attributed to the large amount of high-quality training samples, while it does not work very well on genomic data analysis, since they are notoriously known as small data. In our work, we focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients that can guide treatment decisions for a specific individual through training on small data. In fact, doctors and clinicians always address this problem by studying several interrelated clinical variables simultaneously. We attempt to simulate such clinical perspective, and introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks and transfer it to help address new tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification. Observing that gene expression data have specifically high dimensionality and high noise properties compared with image data, we proposed a new extension of it by appending two modules to address these issues. Concretely, we append a feature selection layer to automatically filter out the disease-irrelated genes and incorporate a sample reweighting strategy to adaptively remove noisy data, and meanwhile the extended model is capable of learning from a limited number of training examples and generalize well. Simulations and real gene expression data experiments substantiate the superiority of the proposed method for predicting the subtypes of disease and identifying potential disease-related genes.

preprint2016arXiv

Categorization Axioms for Clustering Results

Cluster analysis has attracted more and more attention in the field of machine learning and data mining. Numerous clustering algorithms have been proposed and are being developed due to diverse theories and various requirements of emerging applications. Therefore, it is very worth establishing an unified axiomatic framework for data clustering. In the literature, it is an open problem and has been proved very challenging. In this paper, clustering results are axiomatized by assuming that an proper clustering result should satisfy categorization axioms. The proposed axioms not only introduce classification of clustering results and inequalities of clustering results, but also are consistent with prototype theory and exemplar theory of categorization models in cognitive science. Moreover, the proposed axioms lead to three principles of designing clustering algorithm and cluster validity index, which follow many popular clustering algorithms and cluster validity indices.

preprint2016arXiv

Greedy Criterion in Orthogonal Greedy Learning

Orthogonal greedy learning (OGL) is a stepwise learning scheme that starts with selecting a new atom from a specified dictionary via the steepest gradient descent (SGD) and then builds the estimator through orthogonal projection. In this paper, we find that SGD is not the unique greedy criterion and introduce a new greedy criterion, called "$δ$-greedy threshold" for learning. Based on the new greedy criterion, we derive an adaptive termination rule for OGL. Our theoretical study shows that the new learning scheme can achieve the existing (almost) optimal learning rate of OGL. Plenty of numerical experiments are provided to support that the new scheme can achieve almost optimal generalization performance, while requiring less computation than OGL.

preprint2016arXiv

Low-rank Matrix Factorization under General Mixture Noise Distributions

Many computer vision problems can be posed as learning a low-dimensional subspace from high dimensional data. The low rank matrix factorization (LRMF) represents a commonly utilized subspace learning strategy. Most of the current LRMF techniques are constructed on the optimization problems using L1-norm and L2-norm losses, which mainly deal with Laplacian and Gaussian noises, respectively. To make LRMF capable of adapting more complex noise, this paper proposes a new LRMF model by assuming noise as Mixture of Exponential Power (MoEP) distributions and proposes a penalized MoEP (PMoEP) model by combining the penalized likelihood method with MoEP distributions. Such setting facilitates the learned LRMF model capable of automatically fitting the real noise through MoEP distributions. Each component in this mixture is adapted from a series of preliminary super- or sub-Gaussian candidates. Moreover, by facilitating the local continuity of noise components, we embed Markov random field into the PMoEP model and further propose the advanced PMoEP-MRF model. An Expectation Maximization (EM) algorithm and a variational EM (VEM) algorithm are also designed to infer the parameters involved in the proposed PMoEP and the PMoEP-MRF model, respectively. The superseniority of our methods is demonstrated by extensive experiments on synthetic data, face modeling, hyperspectral image restoration and background subtraction.

preprint2016arXiv

Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements

Background subtraction has been a fundamental and widely studied task in video analysis, with a wide range of applications in video surveillance, teleconferencing and 3D modeling. Recently, motivated by compressive imaging, background subtraction from compressive measurements (BSCM) is becoming an active research task in video surveillance. In this paper, we propose a novel tensor-based robust PCA (TenRPCA) approach for BSCM by decomposing video frames into backgrounds with spatial-temporal correlations and foregrounds with spatio-temporal continuity in a tensor framework. In this approach, we use 3D total variation (TV) to enhance the spatio-temporal continuity of foregrounds, and Tucker decomposition to model the spatio-temporal correlations of video background. Based on this idea, we design a basic tensor RPCA model over the video frames, dubbed as the holistic TenRPCA model (H-TenRPCA). To characterize the correlations among the groups of similar 3D patches of video background, we further design a patch-group-based tensor RPCA model (PG-TenRPCA) by joint tensor Tucker decompositions of 3D patch groups for modeling the video background. Efficient algorithms using alternating direction method of multipliers (ADMM) are developed to solve the proposed models. Extensive experiments on simulated and real-world videos demonstrate the superiority of the proposed approaches over the existing state-of-the-art approaches.

preprint2015arXiv

An Efficient Optimization Approach for a Cardinality-Constrained Index Tracking Problem

In the practical business environment, portfolio managers often face business-driven requirements that limit the number of constituents in their tracking portfolio. A natural index tracking model is thus to minimize a tracking error measure while enforcing an upper bound on the number of assets in the portfolio. In this paper we consider such a cardinality-constrained index tracking model. In particular, we propose an efficient nonmonotone projected gradient (NPG) method for solving this problem. At each iteration, this method usually solves several projected gradient subproblems. We show that each subproblem has a closed-form solution, which can be computed in linear time. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the NPG method is a local minimizer of the cardinality-constrained index tracking problem. We also conduct empirical tests to compare our method with the hybrid evolutionary algorithm and the hybrid half thresholding algorithm \cite{L1/2} for index tracking. The computational results demonstrate that our approach generally produces sparse portfolios with smaller out-of-sample tracking error and higher consistency between in-sample and out-of-sample tracking errors. Moreover, our method outperforms the other two approaches in terms of speed.

preprint2015arXiv

Convergence of multi-block Bregman ADMM for nonconvex composite problems

The alternating direction method with multipliers (ADMM) has been one of most powerful and successful methods for solving various composite problems. The convergence of the conventional ADMM (i.e., 2-block) for convex objective functions has been justified for a long time, and its convergence for nonconvex objective functions has, however, been established very recently. The multi-block ADMM, a natural extension of ADMM, is a widely used scheme and has also been found very useful in solving various nonconvex optimization problems. It is thus expected to establish convergence theory of the multi-block ADMM under nonconvex frameworks. In this paper we present a Bregman modification of 3-block ADMM and establish its convergence for a large family of nonconvex functions. We further extend the convergence results to the $N$-block case ($N \geq 3$), which underlines the feasibility of multi-block ADMM applications in nonconvex settings. Finally, we present a simulation study and a real-world application to support the correctness of the obtained theoretical assertions.

preprint2015arXiv

Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

In this paper, we present a novel approach to automatic 3D Facial Expression Recognition (FER) based on deep representation of facial 3D geometric and 2D photometric attributes. A 3D face is firstly represented by its geometric and photometric attributes, including the geometry map, normal maps, normalized curvature map and texture map. These maps are then fed into a pre-trained deep convolutional neural network to generate the deep representation. Then the facial expression prediction is simplyachieved by training linear SVMs over the deep representation for different maps and fusing these SVM scores. The visualizations show that the deep representation provides a complete and highly discriminative coding scheme for 3D faces. Comprehensive experiments on the BU-3DFE database demonstrate that the proposed deep representation can outperform the widely used hand-crafted descriptors (i.e., LBP, SIFT, HOG, Gabor) and the state-of-art approaches under the same experimental protocols.

preprint2015arXiv

Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal

In this paper, we address the problem of estimating and removing non-uniform motion blur from a single blurry image. We propose a deep learning approach to predicting the probabilistic distribution of motion blur at the patch level using a convolutional neural network (CNN). We further extend the candidate set of motion kernels predicted by the CNN using carefully designed image rotations. A Markov random field model is then used to infer a dense non-uniform motion blur field enforcing motion smoothness. Finally, motion blur is removed by a non-uniform deblurring model using patch-level image prior. Experimental evaluations show that our approach can effectively estimate and remove complex non-uniform motion blur that is not handled well by previous approaches.

preprint2015arXiv

Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing

This paper studies the convergence of the adaptively iterative thresholding (AIT) algorithm for compressed sensing. We first introduce a generalized restricted isometry property (gRIP). Then we prove that the AIT algorithm converges to the original sparse solution at a linear rate under a certain gRIP condition in the noise free case. While in the noisy case, its convergence rate is also linear until attaining a certain error bound. Moreover, as by-products, we also provide some sufficient conditions for the convergence of the AIT algorithm based on the two well-known properties, i.e., the coherence property and the restricted isometry property (RIP), respectively. It should be pointed out that such two properties are special cases of gRIP. The solid improvements on the theoretical results are demonstrated and compared with the known results. Finally, we provide a series of simulations to verify the correctness of the theoretical assertions as well as the effectiveness of the AIT algorithm.

preprint2015arXiv

Shrinkage degree in $L_2$-re-scale boosting for regression

Re-scale boosting (RBoosting) is a variant of boosting which can essentially improve the generalization performance of boosting learning. The key feature of RBoosting lies in introducing a shrinkage degree to re-scale the ensemble estimate in each gradient-descent step. Thus, the shrinkage degree determines the performance of RBoosting. The aim of this paper is to develop a concrete analysis concerning how to determine the shrinkage degree in $L_2$-RBoosting. We propose two feasible ways to select the shrinkage degree. The first one is to parameterize the shrinkage degree and the other one is to develope a data-driven approach of it. After rigorously analyzing the importance of the shrinkage degree in $L_2$-RBoosting learning, we compare the pros and cons of the proposed methods. We find that although these approaches can reach the same learning rates, the structure of the final estimate of the parameterized approach is better, which sometimes yields a better generalization capability when the number of sample is finite. With this, we recommend to parameterize the shrinkage degree of $L_2$-RBoosting. To this end, we present an adaptive parameter-selection strategy for shrinkage degree and verify its feasibility through both theoretical analysis and numerical verification. The obtained results enhance the understanding of RBoosting and further give guidance on how to use $L_2$-RBoosting for regression tasks.

preprint2015arXiv

Sparse Index Tracking Based On $L_{1/2}$ Model And Algorithm

Recently, $L_1$ regularization have been attracted extensive attention and successfully applied in mean-variance portfolio selection for promoting out-of-sample properties and decreasing transaction costs. However, $L_1$ regularization approach is ineffective in promoting sparsity and selecting regularization parameter on index tracking with the budget and no-short selling constraints, since the 1-norm of the asset weights will have a constant value of one. Our recent research on $L_{1/2}$ regularization has found that the half thresholding algorithm with optimal regularization parameter setting strategy is the fast solver of $L_{1/2}$ regularization, which can provide the more sparse solution. In this paper we apply $L_{1/2}$ regularization method to stock index tracking and establish a new sparse index tracking model. A hybrid half thresholding algorithm is proposed for solving the model. Empirical tests of model and algorithm are carried out on the eight data sets from OR-library. The optimal tracking portfolio obtained from the new model and algorithm has lower out-of-sample prediction error and consistency both in-sample and out-of-sample. Moreover, since the automatic regularization parameters are selected for the fixed number of optimal portfolio, our algorithm is a fast solver, especially for the large scale problem.

preprint2015arXiv

Sparse Regularization: Convergence Of Iterative Jumping Thresholding Algorithm

In recent studies on sparse modeling, non-convex penalties have received considerable attentions due to their superiorities on sparsity-inducing over the convex counterparts. Compared with the convex optimization approaches, however, the non-convex approaches have more challenging convergence analysis. In this paper, we study the convergence of a non-convex iterative thresholding algorithm for solving sparse recovery problems with a certain class of non-convex penalties, whose corresponding thresholding functions are discontinuous with jump discontinuities. Therefore, we call the algorithm the iterative jumping thresholding (IJT) algorithm. The finite support and sign convergence of IJT algorithm is firstly verified via taking advantage of such jump discontinuity. Together with the assumption of the introduced restricted Kurdyka-Łojasiewicz (rKL) property, then the strong convergence of IJT algorithm can be proved.Furthermore, we can show that IJT algorithm converges to a local minimizer at an asymptotically linear rate under some additional conditions. Moreover, we derive a posteriori computable error estimate, which can be used to design practical terminal rules for the algorithm. It should be pointed out that the $l_q$ quasi-norm ($0<q<1$) is an important subclass of the class of non-convex penalties studied in this paper. In particular, when applied to the $l_q$ regularization, IJT algorithm can converge to a local minimizer with an asymptotically linear rate under certain concentration conditions. We provide also a set of simulations to support the correctness of theoretical assertions and compare the time efficiency of IJT algorithm for the $l_{q}$ regularization ($q=1/2, 2/3$) with other known typical algorithms like the iterative reweighted least squares (IRLS) algorithm and the iterative reweighted $l_{1}$ minimization (IRL1) algorithm.

preprint2015arXiv

Video Primal Sketch: A Unified Middle-Level Representation for Video

This paper presents a middle-level video representation named Video Primal Sketch (VPS), which integrates two regimes of models: i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., ii) FRAME /MRF model reproducing feature statistics extracted from input video to implicitly represent textured motion, such as water and fire. The feature statistics include histograms of spatio-temporal filters and velocity distributions. This paper makes three contributions to the literature: i) Learning a dictionary of video primitives using parametric generative models; ii) Proposing the Spatio-Temporal FRAME (ST-FRAME) and Motion-Appearance FRAME (MA-FRAME) models for modeling and synthesizing textured motion; and iii) Developing a parsimonious hybrid model for generic video representation. Given an input video, VPS selects the proper models automatically for different motion patterns and is compatible with high-level action representations. In the experiments, we synthesize a number of textured motion; reconstruct real videos using the VPS; report a series of human perception experiments to verify the quality of reconstructed videos; demonstrate how the VPS changes over the scale transition in videos; and present the close connection between VPS and high-level action models.

preprint2014arXiv

$L_{1/2}$ Regularization: Convergence of Iterative Half Thresholding Algorithm

In recent studies on sparse modeling, the nonconvex regularization approaches (particularly, $L_{q}$ regularization with $q\in(0,1)$) have been demonstrated to possess capability of gaining much benefit in sparsity-inducing and efficiency. As compared with the convex regularization approaches (say, $L_{1}$ regularization), however, the convergence issue of the corresponding algorithms are more difficult to tackle. In this paper, we deal with this difficult issue for a specific but typical nonconvex regularization scheme, the $L_{1/2}$ regularization, which has been successfully used to many applications. More specifically, we study the convergence of the iterative \textit{half} thresholding algorithm (the \textit{half} algorithm for short), one of the most efficient and important algorithms for solution to the $L_{1/2}$ regularization. As the main result, we show that under certain conditions, the \textit{half} algorithm converges to a local minimizer of the $L_{1/2}$ regularization, with an eventually linear convergence rate. The established result provides a theoretical guarantee for a wide range of applications of the \textit{half} algorithm. We provide also a set of simulations to support the correctness of theoretical assertions and compare the time efficiency of the \textit{half} algorithm with other known typical algorithms for $L_{1/2}$ regularization like the iteratively reweighted least squares (IRLS) algorithm and the iteratively reweighted $l_{1}$ minimization (IRL1) algorithm.

preprint2014arXiv

A Cyclic Coordinate Descent Algorithm for lq Regularization

In recent studies on sparse modeling, $l_q$ ($0<q<1$) regularization has received considerable attention due to its superiorities on sparsity-inducing and bias reduction over the $l_1$ regularization.In this paper, we propose a cyclic coordinate descent (CCD) algorithm for $l_q$ regularization. Our main result states that the CCD algorithm converges globally to a stationary point as long as the stepsize is less than a positive constant. Furthermore, we demonstrate that the CCD algorithm converges to a local minimizer under certain additional conditions. Our numerical experiments demonstrate the efficiency of the CCD algorithm.

preprint2014arXiv

Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems

The alternating direction method with multipliers (ADMM) has been one of most powerful and successful methods for solving various convex or nonconvex composite problems that arise in the fields of image & signal processing and machine learning. In convex settings, numerous convergence results have been established for ADMM as well as its varieties. However, due to the absence of convexity, the convergence analysis of nonconvex ADMM is generally very difficult. In this paper we study the Bregman modification of ADMM (BADMM), which includes the conventional ADMM as a special case and often leads to an improvement of the performance of the algorithm. Under certain assumptions, we prove that the iterative sequence generated by BADMM converges to a stationary point of the associated augmented Lagrangian function. The obtained results underline the feasibility of ADMM in applications under nonconvex settings.

preprint2014arXiv

Greedy metrics in orthogonal greedy learning

Orthogonal greedy learning (OGL) is a stepwise learning scheme that adds a new atom from a dictionary via the steepest gradient descent and build the estimator via orthogonal projecting the target function to the space spanned by the selected atoms in each greedy step. Here, "greed" means choosing a new atom according to the steepest gradient descent principle. OGL then avoids the overfitting/underfitting by selecting an appropriate iteration number. In this paper, we point out that the overfitting/underfitting can also be avoided via redefining "greed" in OGL. To this end, we introduce a new greedy metric, called $δ$-greedy thresholds, to refine "greed" and theoretically verifies its feasibility. Furthermore, we reveals that such a greedy metric can bring an adaptive termination rule on the premise of maintaining the prominent learning performance of OGL. Our results show that the steepest gradient descent is not the unique greedy metric of OGL and some other more suitable metric may lessen the hassle of model-selection of OGL.

preprint2014arXiv

Is Extreme Learning Machine Feasible? A Theoretical Assessment (Part II)

An extreme learning machine (ELM) can be regarded as a two stage feed-forward neural network (FNN) learning system which randomly assigns the connections with and within hidden neurons in the first stage and tunes the connections with output neurons in the second stage. Therefore, ELM training is essentially a linear learning problem, which significantly reduces the computational burden. Numerous applications show that such a computation burden reduction does not degrade the generalization capability. It has, however, been open that whether this is true in theory. The aim of our work is to study the theoretical feasibility of ELM by analyzing the pros and cons of ELM. In the previous part on this topic, we pointed out that via appropriate selection of the activation function, ELM does not degrade the generalization capability in the expectation sense. In this paper, we launch the study in a different direction and show that the randomness of ELM also leads to certain negative consequences. On one hand, we find that the randomness causes an additional uncertainty problem of ELM, both in approximation and learning. On the other hand, we theoretically justify that there also exists an activation function such that the corresponding ELM degrades the generalization capability. In particular, we prove that the generalization capability of ELM with Gaussian kernel is essentially worse than that of FNN with Gaussian kernel. To facilitate the use of ELM, we also provide a remedy to such a degradation. We find that the well-developed coefficient regularization technique can essentially improve the generalization capability. The obtained results reveal the essential characteristic of ELM and give theoretical guidance concerning how to use ELM.

preprint2014arXiv

Learning and approximation capability of orthogonal super greedy algorithm

We consider the approximation capability of orthogonal super greedy algorithms (OSGA) and its applications in supervised learning. OSGA is concerned with selecting more than one atoms in each iteration step, which, of course, greatly reduces the computational burden when compared with the conventional orthogonal greedy algorithm (OGA). We prove that even for function classes that are not the convex hull of the dictionary, OSGA does not degrade the approximation capability of OGA provided the dictionary is incoherent. Based on this, we deduce a tight generalization error bound for OSGA learning. Our results show that in the realm of supervised learning, OSGA provides a possibility to further reduce the computational burden of OGA in the premise of maintaining its prominent generalization capability.

preprint2014arXiv

Learning rates of $l^q$ coefficient regularization learning with Gaussian kernel

Regularization is a well recognized powerful strategy to improve the performance of a learning machine and $l^q$ regularization schemes with $0<q<\infty$ are central in use. It is known that different $q$ leads to different properties of the deduced estimators, say, $l^2$ regularization leads to smooth estimators while $l^1$ regularization leads to sparse estimators. Then, how does the generalization capabilities of $l^q$ regularization learning vary with $q$? In this paper, we study this problem in the framework of statistical learning theory and show that implementing $l^q$ coefficient regularization schemes in the sample dependent hypothesis space associated with Gaussian kernel can attain the same almost optimal learning rates for all $0<q<\infty$. That is, the upper and lower bounds of learning rates for $l^q$ regularization learning are asymptotically identical for all $0<q<\infty$. Our finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact with respect to the generalization capability. From this perspective, $q$ can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..

preprint2014arXiv

On parallel multisplitting block iterative methods for linear systems arising in the numerical solution of Euler equations

The paper studies the convergence of some parallel multisplitting block iterative methods for the solution of linear systems arising in the numerical solution of Euler equations. Some sufficient conditions for convergence are proposed. As special cases the convergence of the parallel block generalized AOR (BGAOR), the parallel block AOR (BAOR), the parallel block generalized SOR (BGSOR), the parallel block SOR (BSOR), the extrapolated parallel BAOR and the extrapolated parallel BSOR methods are presented. Furthermore, the convergence of the parallel block iterative methods for linear systems with special block tridiagonal matrices arising in the numerical solution of Euler equations are discussed. Finally, some examples are given to demonstrate the convergence results obtained in this paper.

preprint2014arXiv

On the Optimal Solution of Weighted Nuclear Norm Minimization

In recent years, the nuclear norm minimization (NNM) problem has been attracting much attention in computer vision and machine learning. The NNM problem is capitalized on its convexity and it can be solved efficiently. The standard nuclear norm regularizes all singular values equally, which is however not flexible enough to fit real scenarios. Weighted nuclear norm minimization (WNNM) is a natural extension and generalization of NNM. By assigning properly different weights to different singular values, WNNM can lead to state-of-the-art results in applications such as image denoising. Nevertheless, so far the global optimal solution of WNNM problem is not completely solved yet due to its non-convexity in general cases. In this article, we study the theoretical properties of WNNM and prove that WNNM can be equivalently transformed into a quadratic programming problem with linear constraints. This implies that WNNM is equivalent to a convex problem and its global optimum can be readily achieved by off-the-shelf convex optimization solvers. We further show that when the weights are non-descending, the globally optimal solution of WNNM can be obtained in closed-form.

preprint2013arXiv

Compressed Sensing SAR Imaging with Multilook Processing

Multilook processing is a widely used speckle reduction approach in synthetic aperture radar (SAR) imaging. Conventionally, it is achieved by incoherently summing of some independent low-resolution images formulated from overlapping subbands of the SAR signal. However, in the context of compressive sensing (CS) SAR imaging, where the samples are collected at sub-Nyquist rate, the data spectrum is highly aliased that hinders the direct application of the existing multilook techniques. In this letter, we propose a new CS-SAR imaging method that can realize multilook processing simultaneously during image reconstruction. The main idea is to replace the SAR observation matrix by the inverse of multilook procedures, which is then combined with random sampling matrix to yield a multilook CS-SAR observation model. Then a joint sparse regularization model, considering pixel dependency of subimages, is derived to form multilook images. The suggested SAR imaging method can not only reconstruct sparse scene efficiently below Nyquist rate, but is also able to achieve a comparable reduction of speckles during reconstruction. Simulation results are finally provided to demonstrate the effectiveness of the proposed method.

preprint2013arXiv

Dictionary learning under global sparsity constraint

A new method is proposed in this paper to learn overcomplete dictionary from training data samples. Differing from the current methods that enforce similar sparsity constraint on each of the input samples, the proposed method attempts to impose global sparsity constraint on the entire data set. This enables the proposed method to fittingly assign the atoms of the dictionary to represent various samples and optimally adapt to the complicated structures underlying the entire data set. By virtue of the sparse coding and sparse PCA techniques, a simple algorithm is designed for the implementation of the method. The efficiency and the convergence of the proposed algorithm are also theoretically analyzed. Based on the experimental results implemented on a series of signal and image data sets, it is apparent that our method performs better than the current dictionary learning methods in original dictionary recovering, input data reconstructing, and salient data structure revealing.

preprint2013arXiv

Fast Compressed Sensing SAR Imaging based on Approximated Observation

In recent years, compressed sensing (CS) has been applied in the field of synthetic aperture radar (SAR) imaging and shows great potential. The existing models are, however, based on application of the sensing matrix acquired by the exact observation functions. As a result, the corresponding reconstruction algorithms are much more time consuming than traditional matched filter (MF) based focusing methods, especially in high resolution and wide swath systems. In this paper, we formulate a new CS-SAR imaging model based on the use of the approximated SAR observation deducted from the inverse of focusing procedures. We incorporate CS and MF within an sparse regularization framework that is then solved by a fast iterative thresholding algorithm. The proposed model forms a new CS-SAR imaging method that can be applied to high-quality and high-resolution imaging under sub-Nyquist rate sampling, while saving the computational cost substantially both in time and memory. Simulations and real SAR data applications support that the proposed method can perform SAR imaging effectively and efficiently under Nyquist rate, especially for large scale applications.

preprint2013arXiv

Sparse Solution of Underdetermined Linear Equations via Adaptively Iterative Thresholding

Finding the sparset solution of an underdetermined system of linear equations $y=Ax$ has attracted considerable attention in recent years. Among a large number of algorithms, iterative thresholding algorithms are recognized as one of the most efficient and important classes of algorithms. This is mainly due to their low computational complexities, especially for large scale applications. The aim of this paper is to provide guarantees on the global convergence of a wide class of iterative thresholding algorithms. Since the thresholds of the considered algorithms are set adaptively at each iteration, we call them adaptively iterative thresholding (AIT) algorithms. As the main result, we show that as long as $A$ satisfies a certain coherence property, AIT algorithms can find the correct support set within finite iterations, and then converge to the original sparse solution exponentially fast once the correct support set has been identified. Meanwhile, we also demonstrate that AIT algorithms are robust to the algorithmic parameters. In addition, it should be pointed out that most of the existing iterative thresholding algorithms such as hard, soft, half and smoothly clipped absolute deviation (SCAD) algorithms are included in the class of AIT algorithms studied in this paper.

preprint2012arXiv

A recursive divide-and-conquer approach for sparse principal component analysis

In this paper, a new method is proposed for sparse PCA based on the recursive divide-and-conquer methodology. The main idea is to separate the original sparse PCA problem into a series of much simpler sub-problems, each having a closed-form solution. By recursively solving these sub-problems in an analytical way, an efficient algorithm is constructed to solve the sparse PCA problem. The algorithm only involves simple computations and is thus easy to implement. The proposed method can also be very easily extended to other sparse PCA problems with certain constraints, such as the nonnegative sparse PCA problem. Furthermore, we have shown that the proposed algorithm converges to a stationary point of the problem, and its computational complexity is approximately linear in both data size and dimensionality. The effectiveness of the proposed method is substantiated by extensive experiments implemented on a series of synthetic and real data in both reconstruction-error-minimization and data-variance-maximization viewpoints.

preprint2012arXiv

Divide-and-Conquer Method for L1 Norm Matrix Factorization in the Presence of Outliers and Missing Data

The low-rank matrix factorization as a L1 norm minimization problem has recently attracted much attention due to its intrinsic robustness to the presence of outliers and missing data. In this paper, we propose a new method, called the divide-and-conquer method, for solving this problem. The main idea is to break the original problem into a series of smallest possible sub-problems, each involving only unique scalar parameter. Each of these subproblems is proved to be convex and has closed-form solution. By recursively optimizing these small problems in an analytical way, efficient algorithm, entirely avoiding the time-consuming numerical optimization as an inner loop, for solving the original problem can naturally be constructed. The computational complexity of the proposed algorithm is approximately linear in both data size and dimensionality, making it possible to handle large-scale L1 norm matrix factorization problems. The algorithm is also theoretically proved to be convergent. Based on a series of experiment results, it is substantiated that our method always achieves better results than the current state-of-the-art methods on $L1$ matrix factorization calculation in both computational time and accuracy, especially on large-scale applications such as face recognition and structure from motion.

preprint2009arXiv

On $[[n,n-4,3]]_{q}$ Quantum MDS Codes for odd prime power $q$

For each odd prime power $q$, let $4 \leq n\leq q^{2}+1$. Hermitian self-orthogonal $[n,2,n-1]$ codes over $GF(q^{2})$ with dual distance three are constructed by using finite field theory. Hence, $[[n,n-4,3]]_{q}$ quantum MDS codes for $4 \leq n\leq q^{2}+1$ are obtained.

Zongben Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

42 published item(s)

Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration

LDP-IDS: Local Differential Privacy for Infinite Data Streams

Robust spectral compressive sensing via vanilla gradient descent

Graph Neural Network Encoding for Community Detection in Attribute Networks

Learning adaptive differential evolution algorithm from optimization experiences by policy gradient

Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution

Learning Adaptive Loss for Robust Learning with Noisy Labels

Learning to be Global Optimizer

Learning to Search for MIMO Detection

Meta Transition Adaptation for Robust Deep Learning with Noisy Labels

On Hyper-parameter Tuning for Stochastic Optimization Algorithms

Polarimetric SAR Image Semantic Segmentation with 3D Discrete Wavelet Transform and Markov Random Field

Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction

Categorization Axioms for Clustering Results

Greedy Criterion in Orthogonal Greedy Learning

Low-rank Matrix Factorization under General Mixture Noise Distributions

Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements

An Efficient Optimization Approach for a Cardinality-Constrained Index Tracking Problem

Convergence of multi-block Bregman ADMM for nonconvex composite problems

Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal

Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing

Shrinkage degree in $L_2$-re-scale boosting for regression

Sparse Index Tracking Based On $L_{1/2}$ Model And Algorithm

Sparse Regularization: Convergence Of Iterative Jumping Thresholding Algorithm

Video Primal Sketch: A Unified Middle-Level Representation for Video

$L_{1/2}$ Regularization: Convergence of Iterative Half Thresholding Algorithm

A Cyclic Coordinate Descent Algorithm for lq Regularization

Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems

Greedy metrics in orthogonal greedy learning

Is Extreme Learning Machine Feasible? A Theoretical Assessment (Part II)

Learning and approximation capability of orthogonal super greedy algorithm

Learning rates of $l^q$ coefficient regularization learning with Gaussian kernel

On parallel multisplitting block iterative methods for linear systems arising in the numerical solution of Euler equations

On the Optimal Solution of Weighted Nuclear Norm Minimization

Compressed Sensing SAR Imaging with Multilook Processing

Dictionary learning under global sparsity constraint

Fast Compressed Sensing SAR Imaging based on Approximated Observation

Sparse Solution of Underdetermined Linear Equations via Adaptively Iterative Thresholding

A recursive divide-and-conquer approach for sparse principal component analysis

Divide-and-Conquer Method for L1 Norm Matrix Factorization in the Presence of Outliers and Missing Data

On $[[n,n-4,3]]_{q}$ Quantum MDS Codes for odd prime power $q$