Source author record

Jinshan Zeng

Jinshan Zeng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Computer Vision math.DS Numerical Analysis Applications Artificial Intelligence Computer Science and Game Theory Cryptography and Security Information Theory math.IT math.ST Software Engineering Statistics Theory

Catalog footprint

What is connected

17works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game

Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary.

preprint2022arXiv

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

Automatic code generation is to generate the program code according to the given natural language description. The current mainstream approach uses neural networks to encode natural language descriptions, and output abstract syntax trees (AST) at the decoder, then convert the AST into program code. While the generated code largely conforms to specific syntax rules, two problems are still ignored. One is missing program testing, an essential step in the process of complete code implementation; the other is only focusing on the syntax compliance of the generated code, while ignoring the more important program functional requirements. The paper proposes a CodeGen-Test model, which adds program testing steps and incorporates program testing information to iteratively generate code that meets the functional requirements of the program, thereby improving the quality of code generation. At the same time, the paper proposes a new evaluation metric, test accuracy (Test-Acc), which represents the proportion of passing program test in generated code. Different from the previous evaluation metric, which only evaluates the quality of code generation from the perspective of character similarity, the Test-Acc can evaluate the quality of code generation from the Program functions. Moreover, the paper evaluates the CodeGen-test model on a python data set "hearthstone legend". The experimental results show the proposed method can effectively improve the quality of generated code. Compared with the existing optimal model, CodeGen-Test model improves the Bleu value by 0.2%, Rouge-L value by 0.3% and Test-Acc by 6%.

preprint2022arXiv

Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces

The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.

preprint2021arXiv

On Stochastic Variance Reduced Gradient Method for Semidefinite Optimization

The low-rank stochastic semidefinite optimization has attracted rising attention due to its wide range of applications. The nonconvex reformulation based on the low-rank factorization, significantly improves the computational efficiency but brings some new challenge to the analysis. The stochastic variance reduced gradient (SVRG) method has been regarded as one of the most effective methods. SVRG in general consists of two loops, where a reference full gradient is first evaluated in the outer loop and then used to yield a variance reduced estimate of the current gradient in the inner loop. Two options have been suggested to yield the output of the inner loop, where Option I sets the output as its last iterate, and Option II yields the output via random sampling from all the iterates in the inner loop. However, there is a significant gap between the theory and practice of SVRG when adapted to the stochastic semidefinite programming (SDP). SVRG practically works better with Option I, while most of existing theoretical results focus on Option II. In this paper, we fill this gap via exploiting a new semi-stochastic variant of the original SVRG with Option I adapted to the semidefinite optimization. Equipped with this, we establish the global linear submanifold convergence (i.e., converging exponentially fast to a submanifold of a global minimum under the orthogonal group action) of the proposed SVRG method, given a provable initialization scheme and under certain smoothness and restricted strongly convex assumptions. Our analysis includes the effects of the mini-batch size and update frequency in the inner loop as well as two practical step size strategies, the fixed and stabilized Barzilai-Borwein step sizes. Some numerical results in matrix sensing demonstrate the efficiency of proposed SVRG method outperforming Option II counterpart as well as others.

preprint2021arXiv

StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

The generation of stylish Chinese fonts is an important problem involved in many applications. Most of existing generation methods are based on the deep generative models, particularly, the generative adversarial networks (GAN) based models. However, these deep generative models may suffer from the mode collapse issue, which significantly degrades the diversity and quality of generated results. In this paper, we introduce a one-bit stroke encoding to capture the key mode information of Chinese characters and then incorporate it into CycleGAN, a popular deep generative model for Chinese font generation. As a result we propose an efficient method called StrokeGAN, mainly motivated by the observation that the stroke encoding contains amount of mode information of Chinese characters. In order to reconstruct the one-bit stroke encoding of the associated generated characters, we introduce a stroke-encoding reconstruction loss imposed on the discriminator. Equipped with such one-bit stroke encoding and stroke-encoding reconstruction loss, the mode collapse issue of CycleGAN can be significantly alleviated, with an improved preservation of strokes and diversity of generated characters. The effectiveness of StrokeGAN is demonstrated by a series of generation tasks over nine datasets with different fonts. The numerical results demonstrate that StrokeGAN generally outperforms the state-of-the-art methods in terms of content and recognition accuracies, as well as certain stroke error, and also generates more realistic characters.

preprint2020arXiv

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling over-parameterized models to compressive ones, we propose a new approach based on differential inclusions of inverse scale spaces. Specifically, it generates a family of models from simple to complex ones that couples a pair of parameters to simultaneously train over-parameterized deep models and structural sparsity on weights of fully connected and convolutional layers. Such a differential inclusion scheme has a simple discretization, proposed as Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose global convergence analysis in deep learning is established that from any initializations, algorithmic iterations converge to a critical point of empirical risks. Experimental evidence shows that DessiLBI achieve comparable and even better performance than the competitive optimizers in exploring the structural sparsity of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, DessiLBI unveils "winning tickets" in early epochs: the effective sparse structure with comparable test accuracy to fully trained over-parameterized models.

preprint2020arXiv

Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning Rates and Early Stopping

Boosting is a well-known method for improving the accuracy of weak learners in machine learning. However, its theoretical generalization guarantee is missing in literature. In this paper, we propose an efficient boosting method with theoretical generalization guarantees for binary classification. Three key ingredients of the proposed boosting method are: a) the \textit{fully-corrective greedy} (FCG) update in the boosting procedure, b) a differentiable \textit{squared hinge} (also called \textit{truncated quadratic}) function as the loss function, and c) an efficient alternating direction method of multipliers (ADMM) algorithm for the associated FCG optimization. The used squared hinge loss not only inherits the robustness of the well-known hinge loss for classification with outliers, but also brings some benefits for computational implementation and theoretical justification. Under some sparseness assumption, we derive a fast learning rate of the order ${\cal O}((m/\log m)^{-1/4})$ for the proposed boosting method, which can be further improved to ${\cal O}((m/\log m)^{-1/2})$ if certain additional noise assumption is imposed, where $m$ is the size of sample set. Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification. Moreover, an efficient early stopping scheme is provided for the proposed method. A series of toy simulations and real data experiments are conducted to verify the developed theories and demonstrate the effectiveness of the proposed method.

preprint2016arXiv

Constructive neural network learning

In this paper, we aim at developing scalable neural network-type learning systems. Motivated by the idea of "constructive neural networks" in approximation theory, we focus on "constructing" rather than "training" feed-forward neural networks (FNNs) for learning, and propose a novel FNNs learning system called the constructive feed-forward neural network (CFN). Theoretically, we prove that the proposed method not only overcomes the classical saturation problem for FNN approximation, but also reaches the optimal learning rate when the regression function is smooth, while the state-of-the-art learning rates established for traditional FNNs are only near optimal (up to a logarithmic factor). A series of numerical simulations are provided to show the efficiency and feasibility of CFN via comparing with the well-known regularized least squares (RLS) with Gaussian kernel and extreme learning machine (ELM).

preprint2016arXiv

Greedy Criterion in Orthogonal Greedy Learning

Orthogonal greedy learning (OGL) is a stepwise learning scheme that starts with selecting a new atom from a specified dictionary via the steepest gradient descent (SGD) and then builds the estimator through orthogonal projection. In this paper, we find that SGD is not the unique greedy criterion and introduce a new greedy criterion, called "$δ$-greedy threshold" for learning. Based on the new greedy criterion, we derive an adaptive termination rule for OGL. Our theoretical study shows that the new learning scheme can achieve the existing (almost) optimal learning rate of OGL. Plenty of numerical experiments are provided to support that the new scheme can achieve almost optimal generalization performance, while requiring less computation than OGL.

preprint2015arXiv

A Gauss-Seidel Iterative Thresholding Algorithm for lq Regularized Least Squares Regression

In recent studies on sparse modeling, $l_q$ ($0<q<1$) regularized least squares regression ($l_q$LS) has received considerable attention due to its superiorities on sparsity-inducing and bias-reduction over the convex counterparts. In this paper, we propose a Gauss-Seidel iterative thresholding algorithm (called GAITA) for solution to this problem. Different from the classical iterative thresholding algorithms using the Jacobi updating rule, GAITA takes advantage of the Gauss-Seidel rule to update the coordinate coefficients. Under a mild condition, we can justify that the support set and sign of an arbitrary sequence generated by GAITA will converge within finite iterations. This convergence property together with the Kurdyka-Łojasiewicz property of ($l_q$LS) naturally yields the strong convergence of GAITA under the same condition as above, which is generally weaker than the condition for the convergence of the classical iterative thresholding algorithms. Furthermore, we demonstrate that GAITA converges to a local minimizer under certain additional conditions. A set of numerical experiments are provided to show the effectiveness, particularly, much faster convergence of GAITA as compared with the classical iterative thresholding algorithms.

preprint2015arXiv

Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing

This paper studies the convergence of the adaptively iterative thresholding (AIT) algorithm for compressed sensing. We first introduce a generalized restricted isometry property (gRIP). Then we prove that the AIT algorithm converges to the original sparse solution at a linear rate under a certain gRIP condition in the noise free case. While in the noisy case, its convergence rate is also linear until attaining a certain error bound. Moreover, as by-products, we also provide some sufficient conditions for the convergence of the AIT algorithm based on the two well-known properties, i.e., the coherence property and the restricted isometry property (RIP), respectively. It should be pointed out that such two properties are special cases of gRIP. The solid improvements on the theoretical results are demonstrated and compared with the known results. Finally, we provide a series of simulations to verify the correctness of the theoretical assertions as well as the effectiveness of the AIT algorithm.

preprint2015arXiv

Sparse Regularization: Convergence Of Iterative Jumping Thresholding Algorithm

In recent studies on sparse modeling, non-convex penalties have received considerable attentions due to their superiorities on sparsity-inducing over the convex counterparts. Compared with the convex optimization approaches, however, the non-convex approaches have more challenging convergence analysis. In this paper, we study the convergence of a non-convex iterative thresholding algorithm for solving sparse recovery problems with a certain class of non-convex penalties, whose corresponding thresholding functions are discontinuous with jump discontinuities. Therefore, we call the algorithm the iterative jumping thresholding (IJT) algorithm. The finite support and sign convergence of IJT algorithm is firstly verified via taking advantage of such jump discontinuity. Together with the assumption of the introduced restricted Kurdyka-Łojasiewicz (rKL) property, then the strong convergence of IJT algorithm can be proved.Furthermore, we can show that IJT algorithm converges to a local minimizer at an asymptotically linear rate under some additional conditions. Moreover, we derive a posteriori computable error estimate, which can be used to design practical terminal rules for the algorithm. It should be pointed out that the $l_q$ quasi-norm ($0<q<1$) is an important subclass of the class of non-convex penalties studied in this paper. In particular, when applied to the $l_q$ regularization, IJT algorithm can converge to a local minimizer with an asymptotically linear rate under certain concentration conditions. We provide also a set of simulations to support the correctness of theoretical assertions and compare the time efficiency of IJT algorithm for the $l_{q}$ regularization ($q=1/2, 2/3$) with other known typical algorithms like the iterative reweighted least squares (IRLS) algorithm and the iterative reweighted $l_{1}$ minimization (IRL1) algorithm.

preprint2014arXiv

$L_{1/2}$ Regularization: Convergence of Iterative Half Thresholding Algorithm

In recent studies on sparse modeling, the nonconvex regularization approaches (particularly, $L_{q}$ regularization with $q\in(0,1)$) have been demonstrated to possess capability of gaining much benefit in sparsity-inducing and efficiency. As compared with the convex regularization approaches (say, $L_{1}$ regularization), however, the convergence issue of the corresponding algorithms are more difficult to tackle. In this paper, we deal with this difficult issue for a specific but typical nonconvex regularization scheme, the $L_{1/2}$ regularization, which has been successfully used to many applications. More specifically, we study the convergence of the iterative \textit{half} thresholding algorithm (the \textit{half} algorithm for short), one of the most efficient and important algorithms for solution to the $L_{1/2}$ regularization. As the main result, we show that under certain conditions, the \textit{half} algorithm converges to a local minimizer of the $L_{1/2}$ regularization, with an eventually linear convergence rate. The established result provides a theoretical guarantee for a wide range of applications of the \textit{half} algorithm. We provide also a set of simulations to support the correctness of theoretical assertions and compare the time efficiency of the \textit{half} algorithm with other known typical algorithms for $L_{1/2}$ regularization like the iteratively reweighted least squares (IRLS) algorithm and the iteratively reweighted $l_{1}$ minimization (IRL1) algorithm.

preprint2014arXiv

A Cyclic Coordinate Descent Algorithm for lq Regularization

In recent studies on sparse modeling, $l_q$ ($0<q<1$) regularization has received considerable attention due to its superiorities on sparsity-inducing and bias reduction over the $l_1$ regularization.In this paper, we propose a cyclic coordinate descent (CCD) algorithm for $l_q$ regularization. Our main result states that the CCD algorithm converges globally to a stationary point as long as the stepsize is less than a positive constant. Furthermore, we demonstrate that the CCD algorithm converges to a local minimizer under certain additional conditions. Our numerical experiments demonstrate the efficiency of the CCD algorithm.

preprint2014arXiv

Greedy metrics in orthogonal greedy learning

Orthogonal greedy learning (OGL) is a stepwise learning scheme that adds a new atom from a dictionary via the steepest gradient descent and build the estimator via orthogonal projecting the target function to the space spanned by the selected atoms in each greedy step. Here, "greed" means choosing a new atom according to the steepest gradient descent principle. OGL then avoids the overfitting/underfitting by selecting an appropriate iteration number. In this paper, we point out that the overfitting/underfitting can also be avoided via redefining "greed" in OGL. To this end, we introduce a new greedy metric, called $δ$-greedy thresholds, to refine "greed" and theoretically verifies its feasibility. Furthermore, we reveals that such a greedy metric can bring an adaptive termination rule on the premise of maintaining the prominent learning performance of OGL. Our results show that the steepest gradient descent is not the unique greedy metric of OGL and some other more suitable metric may lessen the hassle of model-selection of OGL.

preprint2014arXiv

Learning rates of $l^q$ coefficient regularization learning with Gaussian kernel

Regularization is a well recognized powerful strategy to improve the performance of a learning machine and $l^q$ regularization schemes with $0<q<\infty$ are central in use. It is known that different $q$ leads to different properties of the deduced estimators, say, $l^2$ regularization leads to smooth estimators while $l^1$ regularization leads to sparse estimators. Then, how does the generalization capabilities of $l^q$ regularization learning vary with $q$? In this paper, we study this problem in the framework of statistical learning theory and show that implementing $l^q$ coefficient regularization schemes in the sample dependent hypothesis space associated with Gaussian kernel can attain the same almost optimal learning rates for all $0<q<\infty$. That is, the upper and lower bounds of learning rates for $l^q$ regularization learning are asymptotically identical for all $0<q<\infty$. Our finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact with respect to the generalization capability. From this perspective, $q$ can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..

preprint2013arXiv

Sparse Solution of Underdetermined Linear Equations via Adaptively Iterative Thresholding

Finding the sparset solution of an underdetermined system of linear equations $y=Ax$ has attracted considerable attention in recent years. Among a large number of algorithms, iterative thresholding algorithms are recognized as one of the most efficient and important classes of algorithms. This is mainly due to their low computational complexities, especially for large scale applications. The aim of this paper is to provide guarantees on the global convergence of a wide class of iterative thresholding algorithms. Since the thresholds of the considered algorithms are set adaptively at each iteration, we call them adaptively iterative thresholding (AIT) algorithms. As the main result, we show that as long as $A$ satisfies a certain coherence property, AIT algorithms can find the correct support set within finite iterations, and then converge to the original sparse solution exponentially fast once the correct support set has been identified. Meanwhile, we also demonstrate that AIT algorithms are robust to the algorithmic parameters. In addition, it should be pointed out that most of the existing iterative thresholding algorithms such as hard, soft, half and smoothly clipped absolute deviation (SCAD) algorithms are included in the class of AIT algorithms studied in this paper.

Jinshan Zeng

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces

On Stochastic Variance Reduced Gradient Method for Semidefinite Optimization

StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning Rates and Early Stopping

Constructive neural network learning

Greedy Criterion in Orthogonal Greedy Learning

A Gauss-Seidel Iterative Thresholding Algorithm for lq Regularized Least Squares Regression

Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing

Sparse Regularization: Convergence Of Iterative Jumping Thresholding Algorithm

$L_{1/2}$ Regularization: Convergence of Iterative Half Thresholding Algorithm

A Cyclic Coordinate Descent Algorithm for lq Regularization

Greedy metrics in orthogonal greedy learning

Learning rates of $l^q$ coefficient regularization learning with Gaussian kernel

Sparse Solution of Underdetermined Linear Equations via Adaptively Iterative Thresholding