Source author record

Zhihua Zhang

Zhihua Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

55works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CAST-LUT: Tokenizer-Guided HSV Look-Up Tables for Purple Flare Removal

Purple flare, a diffuse chromatic aberration artifact commonly found around highlight areas, severely degrades the tone transition and color of the image. Existing traditional methods are based on hand-crafted features, which lack flexibility and rely entirely on fixed priors, while the scarcity of paired training data critically hampers deep learning. To address this issue, we propose a novel network built upon decoupled HSV Look-Up Tables (LUTs). The method aims to simplify color correction by adjusting the Hue (H), Saturation (S), and Value (V) components independently. This approach resolves the inherent color coupling problems in traditional methods. Our model adopts a two-stage architecture: First, a Chroma-Aware Spectral Tokenizer (CAST) converts the input image from RGB space to HSV space and independently encodes the Hue (H) and Value (V) channels into a set of semantic tokens describing the Purple flare status; second, the HSV-LUT module takes these tokens as input and dynamically generates independent correction curves (1D-LUTs) for the three channels H, S, and V. To effectively train and validate our model, we built the first large-scale purple flare dataset with diverse scenes. We also proposed new metrics and a loss function specifically designed for this task. Extensive experiments demonstrate that our model not only significantly outperforms existing methods in visual effects but also achieves state-of-the-art performance on all quantitative metrics.

preprint2023arXiv

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction

In this paper, we study the lower complexity bounds for finite-sum optimization problems, where the objective is the average of $n$ individual component functions. We consider Proximal Incremental First-order (PIFO) algorithms which have access to the gradient and proximal oracles for each component function. To incorporate loopless methods, we also allow PIFO algorithms to obtain the full gradient infrequently. We develop a novel approach to constructing the hard instances, which partitions the tridiagonal matrix of classical examples into $n$ groups. This construction is friendly to the analysis of PIFO algorithms. Based on this construction, we establish the lower complexity bounds for finite-sum minimax optimization problems when the objective is convex-concave or nonconvex-strongly-concave and the class of component functions is $L$-average smooth. Most of these bounds are nearly matched by existing upper bounds up to log factors. We can also derive similar lower bounds for finite-sum minimization problems as previous work under both smoothness and average smoothness assumptions. Our lower bounds imply that proximal oracles for smooth functions are not much more powerful than gradient oracles.

preprint2022arXiv

Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods

Optimization is important in machine learning problems, and quasi-Newton methods have a reputation as the most efficient numerical schemes for smooth unconstrained optimization. In this paper, we consider the explicit superlinear convergence rates of quasi-Newton methods and address two open problems mentioned by Rodomanov and Nesterov. First, we extend Rodomanov and Nesterov's results to random quasi-Newton methods, which include common DFP, BFGS, SR1 methods. Such random methods adopt a random direction for updating the approximate Hessian matrix in each iteration. Second, we focus on the specific quasi-Newton methods: SR1 and BFGS methods. We provide improved versions of greedy and random methods with provable better explicit (local) superlinear convergence rates. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth, and strongly self-concordant functions.

preprint2022arXiv

Explicit Superlinear Convergence Rates of Broyden's Methods in Nonlinear Equations

In this paper, we study the explicit superlinear convergence rates of quasi-Newton methods. We particularly focus on the classical Broyden's method for solving nonlinear equations. We establish its explicit (local) superlinear convergence rate when the initial point is close enough to a solution and the initial Jacobian approximation is also close enough to the exact Jacobian related to the solution. Our results present the explicit superlinear convergence rates of Broyden's "good" and "bad" update schemes. These explicit convergence rates in turn provide some important insights on the performance difference between the "good" and "bad" schemes, which are also validated empirically.

preprint2022arXiv

Federated Reinforcement Learning with Environment Heterogeneity

We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two federated RL algorithms, \texttt{QAvg} and \texttt{PAvg}. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments.

preprint2022arXiv

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

In this paper, we follow Eftekhari's work to give a non-local convergence analysis of deep linear networks. Specifically, we consider optimizing deep linear networks which have a layer with one neuron under quadratic loss. We describe the convergent point of trajectories with arbitrary starting point under gradient flow, including the paths which converge to one of the saddle points or the original point. We also show specific convergence rates of trajectories that converge to the global minimizer by stages. To achieve these results, this paper mainly extends the machinery in Eftekhari's work to provably identify the rank-stable set and the global minimizer convergent set. We also give specific examples to show the necessity of our definitions. Crucially, as far as we know, our results appear to be the first to give a non-local global analysis of linear neural networks from arbitrary initialized points, rather than the lazy training regime which has dominated the literature of neural networks, and restricted benign initialization in Eftekhari's work. We also note that extending our results to general linear networks without one hidden neuron assumption remains a challenging open problem.

preprint2022arXiv

Near Optimal Stochastic Algorithms for Finite-Sum Unbalanced Convex-Concave Minimax Optimization

This paper considers stochastic first-order algorithms for convex-concave minimax problems of the form $\min_{\bf x}\max_{\bf y}f(\bf x, \bf y)$, where $f$ can be presented by the average of $n$ individual components which are $L$-average smooth. For $μ_x$-strongly-convex-$μ_y$-strongly-concave setting, we propose a new method which could find a $\varepsilon$-saddle point of the problem in $\tilde{\mathcal O} \big(\sqrt{n(\sqrt{n}+κ_x)(\sqrt{n}+κ_y)}\log(1/\varepsilon)\big)$ stochastic first-order complexity, where $κ_x\triangleq L/μ_x$ and $κ_y\triangleq L/μ_y$. This upper bound is near optimal with respect to $\varepsilon$, $n$, $κ_x$ and $κ_y$ simultaneously. In addition, the algorithm is easily implemented and works well in practical. Our methods can be extended to solve more general unbalanced convex-concave minimax problems and the corresponding upper complexity bounds are also near optimal.

preprint2022arXiv

On the Landscape of One-hidden-layer Sparse Networks and Beyond

Sparse neural networks have received increasing interest due to their small size compared to dense networks. Nevertheless, most existing works on neural network theory have focused on dense neural networks, and the understanding of sparse networks is very limited. In this paper, we study the loss landscape of one-hidden-layer sparse networks. First, we consider sparse networks with a dense final layer. We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer. Second, we discover that spurious valleys and spurious minima can exist for wide sparse networks with a sparse final layer. This is different from wide dense networks which do not have spurious valleys under mild assumptions.

preprint2022arXiv

Sparse Adversarial Attack in Multi-agent Reinforcement Learning

Cooperative multi-agent reinforcement learning (cMARL) has many real applications, but the policy trained by existing cMARL algorithms is not robust enough when deployed. There exist also many methods about adversarial attacks on the RL system, which implies that the RL system can suffer from adversarial attacks, but most of them focused on single agent RL. In this paper, we propose a \textit{sparse adversarial attack} on cMARL systems. We use (MA)RL with regularization to train the attack policy. Our experiments show that the policy trained by the current cMARL algorithm can obtain poor performance when only one or a few agents in the team (e.g., 1 of 8 or 5 of 25) were attacked at a few timesteps (e.g., attack 3 of total 40 timesteps).

preprint2022arXiv

Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach

In an Markov decision process (MDP), unobservable confounders may exist and have impacts on the data generating process, so that the classic off-policy evaluation (OPE) estimators may fail to identify the true value function of the target policy. In this paper, we study the statistical properties of OPE in confounded MDPs with observable instrumental variables. Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure. For non-asymptotic analysis, we prove a $\mathcal{O}(n^{-1/2})$-error bound where $n$ is the number of samples. For asymptotic analysis, we prove that the two-stage estimator is asymptotically normal with a typical rate of $n^{1/2}$. To the best of our knowledge, we are the first to show such statistical results of the two-stage estimator for confounded linear MDPs via instrumental variables.

preprint2022arXiv

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and $(s,a)$-rectangular assumption, we improve their results and also consider other uncertainty sets, including $L_1$ and $χ^2$ balls. Our results show that when we assume $(s,a)$-rectangular on uncertainty sets, the sample complexity is about $\widetilde{O}\left(\frac{|\mathcal{S}|^2|\mathcal{A}|}{\varepsilon^2ρ^2(1-γ)^4}\right)$. In addition, we extend our results from $(s,a)$-rectangular assumption to $s$-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under $(s,a)$-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotic normal with a typical rate $\sqrt{n}$ under $(s,a)$ and $s$-rectangular assumptions from both theoretical and empirical perspectives.

preprint2021arXiv

Delayed Projection Techniques for Linearly Constrained Problems: Convergence Rates, Acceleration, and Applications

In this work, we study a novel class of projection-based algorithms for linearly constrained problems (LCPs) which have a lot of applications in statistics, optimization, and machine learning. Conventional primal gradient-based methods for LCPs call a projection after each (stochastic) gradient descent, resulting in that the required number of projections equals that of gradient descents (or total iterations). Motivated by the recent progress in distributed optimization, we propose the delayed projection technique that calls a projection once for a while, lowering the projection frequency and improving the projection efficiency. Accordingly, we devise a series of stochastic methods for LCPs using the technique, including a variance reduced method and an accelerated one. We theoretically show that it is feasible to improve projection efficiency in both strongly convex and generally convex cases. Our analysis is simple and unified and can be easily extended to other methods using delayed projections. When applying our new algorithms to federated optimization, a newfangled and privacy-preserving subfield in distributed optimization, we obtain not only a variance reduced federated algorithm with convergence rates better than previous works, but also the first accelerated method able to handle data heterogeneity inherent in federated optimization.

preprint2020arXiv

Approximate Newton Methods

Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost in each iteration while commanding a high convergence rate. Other efficient stochastic second order methods are also proposed. However, the convergence properties of these methods are still not well understood. There are also several important gaps between the current convergence theory and the performance in real applications. In this paper, we aim to fill these gaps. We propose a unifying framework to analyze both local and global convergence properties of second order methods. Based on this framework, we present our theoretical results which match the performance in real applications well.

preprint2020arXiv

Intervention Generative Adversarial Networks

In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem. The main idea is to introduce a regularization term that we call intervention loss into the objective. We refer to the resulting generative model as Intervention Generative Adversarial Networks (IVGAN). By perturbing the latent representations of real images obtained from an auxiliary encoder network with Gaussian invariant interventions and penalizing the dissimilarity of the distributions of the resulting generated images, the intervention loss provides more informative gradient for the generator, significantly improving GAN's training stability. We demonstrate the effectiveness and efficiency of our methods via solid theoretical analysis and thorough evaluation on standard real-world datasets as well as the stacked MNIST dataset.

preprint2020arXiv

On the Convergence of FedAvg on Non-IID Data

Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $η$ must decay, even if full-gradient is used; otherwise, the solution will be $Ω(η)$ away from the optimal.

preprint2020arXiv

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

Quantized Neural Networks (QNNs) use low bit-width fixed-point numbers for representing weight parameters and activations, and are often used in real-world applications due to their saving of computation resources and reproducibility of results. Batch Normalization (BN) poses a challenge for QNNs for requiring floating points in reciprocal operations, and previous QNNs either require computing BN at high precision or revise BN to some variants in heuristic ways. In this work, we propose a novel method to quantize BN by converting an affine transformation of two floating points to a fixed-point operation with shared quantized scale, which is friendly for hardware acceleration and model deployment. We confirm that our method maintains same outputs through rigorous theoretical analysis and numerical analysis. Accuracy and efficiency of our quantization method are verified by experiments at layer level on CIFAR and ImageNet datasets. We also believe that our method is potentially useful in other problems involving quantization.

preprint2016arXiv

A Proximal Stochastic Quasi-Newton Algorithm

In this paper, we discuss the problem of minimizing the sum of two convex functions: a smooth function plus a non-smooth function. Further, the smooth part can be expressed by the average of a large number of smooth component functions, and the non-smooth part is equipped with a simple proximal mapping. We propose a proximal stochastic second-order method, which is efficient and scalable. It incorporates the Hessian in the smooth part of the function and exploits multistage scheme to reduce the variance of the stochastic gradient. We prove that our method can achieve linear rate of convergence.

preprint2016arXiv

A Scalable and Extensible Framework for Superposition-Structured Models

In many learning tasks, structural models usually lead to better interpretability and higher generalization performance. In recent years, however, the simple structural models such as lasso are frequently proved to be insufficient. Accordingly, there has been a lot of work on "superposition-structured" models where multiple structural constraints are imposed. To efficiently solve these "superposition-structured" statistical models, we develop a framework based on a proximal Newton-type method. Employing the smoothed conic dual approach with the LBFGS updating formula, we propose a scalable and extensible proximal quasi-Newton (SEP-QN) framework. Empirical analysis on various datasets shows that our framework is potentially powerful, and achieves super-linear convergence rate for optimizing some popular "superposition-structured" statistical models such as the fused sparse group lasso.

preprint2016arXiv

An Efficient Character-Level Neural Machine Translation

Neural machine translation aims at building a single large neural network that can be trained to maximize translation performance. The encoder-decoder architecture with an attention mechanism achieves a translation performance comparable to the existing state-of-the-art phrase-based systems on the task of English-to-French translation. However, the use of large vocabulary becomes the bottleneck in both training and improving the performance. In this paper, we propose an efficient architecture to train a deep character-level neural machine translation by introducing a decimator and an interpolator. The decimator is used to sample the source sequence before encoding while the interpolator is used to resample after decoding. Such a deep model has two major advantages. It avoids the large vocabulary issue radically; at the same time, it is much faster and more memory-efficient in training than conventional character-based models. More interestingly, our model is able to translate the misspelled word like human beings.

preprint2016arXiv

Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features

Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inherent limitations of these optimization problems. In this paper, with certain restrictions on the communication allowed in the procedures, we develop tight lower bounds on communication rounds for a broad class of non-incremental algorithms under this setting. We also provide a lower bound on communication rounds for a class of (randomized) incremental algorithms.

preprint2016arXiv

Revisiting Sub-sampled Newton Methods

Many machine learning models depend on solving a large scale optimization problem. Recently, sub-sampled Newton methods have emerged to attract much attention for optimization due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost at each iteration while commanding a high convergence rate. In this work we propose two new efficient Newton-type methods, Refined Sub-sampled Newton and Refined Sketch Newton. Our methods exhibit a great advantage over existing sub-sampled Newton methods, especially when Hessian-vector multiplication can be calculated efficiently. Specifically, the proposed methods are shown to converge superlinearly in general case and quadratically under a little stronger assumption. The proposed methods can be generalized to a unifying framework for the convergence proof of several existing sub-sampled Newton methods, revealing new convergence properties. Finally, we empirically evaluate the performance of our methods on several standard datasets and the results show consistent improvement in computational efficiency.

preprint2016arXiv

SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions

Symmetric positive semidefinite (SPSD) matrix approximation is an important problem with applications in kernel methods. However, existing SPSD matrix approximation methods such as the Nyström method only have weak error bounds. In this paper we conduct in-depth studies of an SPSD matrix approximation model and establish strong relative-error bounds. We call it the prototype model for it has more efficient and effective extensions, and some of its extensions have high scalability. Though the prototype model itself is not suitable for large-scale data, it is still useful to study its properties, on which the analysis of its extensions relies. This paper offers novel theoretical analysis, efficient algorithms, and a highly accurate extension. First, we establish a lower error bound for the prototype model and improve the error bound of an existing column selection algorithm to match the lower bound. In this way, we obtain the first optimal column selection algorithm for the prototype model. We also prove that the prototype model is exact under certain conditions. Second, we develop a simple column selection algorithm with a provable error bound. Third, we propose a so-called spectral shifting model to make the approximation more accurate when the eigenvalues of the matrix decay slowly, and the improvement is theoretically quantified. The spectral shifting method can also be applied to improve other SPSD matrix approximation models.

preprint2016arXiv

Tighter bound of Sketched Generalized Matrix Approximation

Generalized matrix approximation plays a fundamental role in many machine learning problems, such as CUR decomposition, kernel approximation, and matrix low rank approximation. Especially with today's applications involved in larger and larger dataset, more and more efficient generalized matrix approximation algorithems become a crucially important research issue. In this paper, we find new sketching techniques to reduce the size of the original data matrix to develop new matrix approximation algorithms. Our results derive a much tighter bound for the approximation than previous works: we obtain a $(1+ε)$ approximation ratio with small sketched dimensions which implies a more efficient generalized matrix approximation.

preprint2016arXiv

Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition

Symmetric positive semi-definite (SPSD) matrix approximation methods have been extensively used to speed up large-scale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but is inefficient on large square matrices. The Nyström method is highly efficient, but can only achieve low accuracy. In this paper we propose a novel model that we call the {\it fast SPSD matrix approximation model}. The fast model is nearly as efficient as the Nyström method and as accurate as the prototype model. We show that the fast model can potentially solve eigenvalue problems and kernel learning problems in linear time with respect to the matrix size $n$ to achieve $1+ε$ relative-error, whereas both the prototype model and the Nyström method cost at least quadratic time to attain comparable error bound. Empirical comparisons among the prototype model, the Nyström method, and our fast model demonstrate the superiority of the fast model. We also contribute new understandings of the Nyström method. The Nyström method is a special instance of our fast model and is approximation to the prototype model. Our technique can be straightforwardly applied to make the CUR matrix decomposition more efficiently computed without much affecting the accuracy.

Zhihua Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

55 published item(s)

CAST-LUT: Tokenizer-Guided HSV Look-Up Tables for Purple Flare Removal

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction

Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods

Explicit Superlinear Convergence Rates of Broyden's Methods in Nonlinear Equations

Federated Reinforcement Learning with Environment Heterogeneity

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

Near Optimal Stochastic Algorithms for Finite-Sum Unbalanced Convex-Concave Minimax Optimization

On the Landscape of One-hidden-layer Sparse Networks and Beyond

Sparse Adversarial Attack in Multi-agent Reinforcement Learning

Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

Delayed Projection Techniques for Linearly Constrained Problems: Convergence Rates, Acceleration, and Applications

Approximate Newton Methods

Intervention Generative Adversarial Networks

On the Convergence of FedAvg on Non-IID Data

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

A Proximal Stochastic Quasi-Newton Algorithm

A Scalable and Extensible Framework for Superposition-Structured Models

An Efficient Character-Level Neural Machine Translation

Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features

Revisiting Sub-sampled Newton Methods

SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions

Tighter bound of Sketched Generalized Matrix Approximation

Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition

A New Relaxation Approach to Normalized Hypergraph Cut

A Nonconvex Approach for Structured Sparse Learning

A Parallel algorithm for $\mathcal{X}$-Armed bandits

Adjusting Leverage Scores by Row Weighting: A Practical Approach to Coherent Matrix Completion

Characterisation of matrix entropies

Compound Poisson Processes, Latent Shrinkage Priors and Bayesian Nonconvex Penalization

Fast Spectral Low Rank Matrix Approximation

Improved Analyses of the Randomized Power Method and Block Lanczos Method

Nonconvex Penalization in Sparse Estimation: An Approach Based on the Bernstein Function

On the Global Convergence of Majorization Minimization Algorithms for Nonconvex Optimization Problems

Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

S-PowerGraph: Streaming Graph Partitioning for Natural Graphs by Vertex-Cut

The obstacle problem for nonlinear degenerate equations with $L^{1}$-data

The Singular Value Decomposition, Applications and Beyond

Wishart Mechanism for Differentially Private Principal Components Analysis

Efficient Algorithms and Error Analysis for the Modified Nystrom Method

Enhancement of Visible-Light-Induced Photocurrent and Photocatalytic Activity of V and N Codoped TiO2 Nanotube Array Films

Group Orbit Optimization: A Unified Approach to Data Normalization

Kinetic Energy Plus Penalty Functions for Sparse Estimation

Self-organized vanadium and nitrogen co-doped titania nanotube arrays with enhanced photocatalytic reduction of CO2 into CH4

Some operator convex functions of several variables

Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling

The Bernstein Function: A Unifying Framework of Nonconvex Penalization in Sparse Estimation

The Matrix Ridge Approximation: Algorithms and Applications

A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

Bayesian Multicategory Support Vector Machines

Coherence Functions with Applications in Large-Margin Classification Methods

EP-GIG Priors and Applications in Bayesian Sparse Learning

Multiway Spectral Clustering: A Margin-Based Perspective

On Edwards-Child's inequality

On the circumradius of a special class of n-simplices