Source author record

Yunlong Feng

Yunlong Feng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC math.ST Methodology Statistics Theory

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

A Framework of Learning Through Empirical Gain Maximization

We develop in this paper a framework of empirical gain maximization (EGM) to address the robust regression problem where heavy-tailed noise or outliers may present in the response variable. The idea of EGM is to approximate the density function of the noise distribution instead of approximating the truth function directly as usual. Unlike the classical maximum likelihood estimation that encourages equal importance of all observations and could be problematic in the presence of abnormal observations, EGM schemes can be interpreted from a minimum distance estimation viewpoint and allow the ignorance of those observations. Furthermore, it is shown that several well-known robust nonconvex regression paradigms, such as Tukey regression and truncated least square regression, can be reformulated into this new framework. We then develop a learning theory for EGM, by means of which a unified analysis can be conducted for these well-established but not fully-understood regression approaches. Resulting from the new framework, a novel interpretation of existing bounded nonconvex loss functions can be concluded. Within this new framework, the two seemingly irrelevant terminologies, the well-known Tukey's biweight loss for robust regression and the triweight kernel for nonparametric smoothing, are closely related. More precisely, it is shown that the Tukey's biweight loss can be derived from the triweight kernel. Similarly, other frequently employed bounded nonconvex loss functions in machine learning such as the truncated square loss, the Geman-McClure loss, and the exponential squared loss can also be reformulated from certain smoothing kernels in statistics. In addition, the new framework enables us to devise new bounded nonconvex loss functions for robust learning.

preprint2020arXiv

A Statistical Learning Approach to Modal Regression

This paper studies the nonparametric modal regression problem systematically from a statistical learning view. Originally motivated by pursuing a theoretical understanding of the maximum correntropy criterion based regression (MCCR), our study reveals that MCCR with a tending-to-zero scale parameter is essentially modal regression. We show that nonparametric modal regression problem can be approached via the classical empirical risk minimization. Some efforts are then made to develop a framework for analyzing and implementing modal regression. For instance, the modal regression function is described, the modal regression risk is defined explicitly and its \textit{Bayes} rule is characterized; for the sake of computational tractability, the surrogate modal regression risk, which is termed as the generalization risk in our study, is introduced. On the theoretical side, the excess modal regression risk, the excess generalization risk, the function estimation error, and the relations among the above three quantities are studied rigorously. It turns out that under mild conditions, function estimation consistency and convergence may be pursued in modal regression as in vanilla regression protocols, such as mean regression, median regression, and quantile regression. However, it outperforms these regression models in terms of robustness as shown in our study from a re-descending M-estimation view. This coincides with and in return explains the merits of MCCR on robustness. On the practical side, the implementation issues of modal regression including the computational algorithm and the tuning parameters selection are discussed. Numerical assessments on modal regression are also conducted to verify our findings empirically.

preprint2020arXiv

Half-Quadratic Alternating Direction Method of Multipliers for Robust Orthogonal Tensor Approximation

Higher-order tensor canonical polyadic decomposition (CPD) with one or more of the latent factor matrices being columnwisely orthonormal has been well studied in recent years. However, most existing models penalize the noises, if occurring, by employing the least squares loss, which may be sensitive to non-Gaussian noise or outliers, leading to bias estimates of the latent factors. In this paper, based on the maximum a posterior estimation, we derive a robust orthogonal tensor CPD model with Cauchy loss, which is resistant to heavy-tailed noise or outliers. By exploring the half-quadratic property of the model, a new method, which is termed as half-quadratic alternating direction method of multipliers (HQ-ADMM), is proposed to solve the model. Each subproblem involved in HQ-ADMM admits a closed-form solution. Thanks to some nice properties of the Cauchy loss, we show that the whole sequence generated by the algorithm globally converges to a stationary point of the problem under consideration. Numerical experiments on synthetic and real data demonstrate the efficiency and robustness of the proposed model and algorithm.

preprint2020arXiv

New Insights into Learning with Correntropy Based Regression

Stemming from information-theoretic learning, the correntropy criterion and its applications to machine learning tasks have been extensively explored and studied. Its application to regression problems leads to the robustness enhanced regression paradigm -- namely, correntropy based regression. Having drawn a great variety of successful real-world applications, its theoretical properties have also been investigated recently in a series of studies from a statistical learning viewpoint. The resulting big picture is that correntropy based regression regresses towards the conditional mode function or the conditional mean function robustly under certain conditions. Continuing this trend and going further, in the present study, we report some new insights into this problem. First, we show that under the additive noise regression model, such a regression paradigm can be deduced from minimum distance estimation, implying that the resulting estimator is essentially a minimum distance estimator and thus possesses robustness properties. Second, we show that the regression paradigm, in fact, provides a unified approach to regression problems in that it approaches the conditional mean, the conditional mode, as well as the conditional median functions under certain conditions. Third, we present some new results when it is utilized to learn the conditional mean function by developing its error bounds and exponential convergence rates under conditional $(1+ε)$-moment assumptions. The saturation effect on the established convergence rates, which was observed under $(1+ε)$-moment assumptions, still occurs, indicating the inherent bias of the regression estimator. These novel insights deepen our understanding of correntropy based regression, help cement the theoretic correntropy framework, and also enable us to investigate learning schemes induced by general bounded nonconvex loss functions.

preprint2016arXiv

Kernel Density Estimation for Dynamical Systems

We study the density estimation problem with observations generated by certain dynamical systems that admit a unique underlying invariant Lebesgue density. Observations drawn from dynamical systems are not independent and moreover, usual mixing concepts may not be appropriate for measuring the dependence among these observations. By employing the $\mathcal{C}$-mixing concept to measure the dependence, we conduct statistical analysis on the consistency and convergence of the kernel density estimator. Our main results are as follows: First, we show that with properly chosen bandwidth, the kernel density estimator is universally consistent under $L_1$-norm; Second, we establish convergence rates for the estimator with respect to several classes of dynamical systems under $L_1$-norm. In the analysis, the density function $f$ is only assumed to be Hölder continuous which is a weak assumption in the literature of nonparametric density estimation and also more realistic in the dynamical system context. Last but not least, we prove that the same convergence rates of the estimator under $L_\infty$-norm and $L_1$-norm can be achieved when the density function is Hölder continuous, compactly supported and bounded. The bandwidth selection problem of the kernel density estimator for dynamical system is also discussed in our study via numerical simulations.

preprint2016arXiv

Learning theory estimates with observations from general stationary stochastic processes

This paper investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by \emph{general}, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established. The obtained oracle inequality is then applied to derive convergence rates for several learning schemes such as empirical risk minimization (ERM), least squares support vector machines (LS-SVMs) using given generic kernels, and SVMs using Gaussian kernels for both least squares and quantile regression. It turns out that for i.i.d.~processes, our learning rates for ERM recover the optimal rates. On the other hand, for non-i.i.d.~processes including geometrically $α$-mixing Markov processes, geometrically $α$-mixing processes with restricted decay, $ϕ$-mixing processes, and (time-reversed) geometrically $\mathcal{C}$-mixing processes, our learning rates for SVMs with Gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates. For the remaining cases, our rates are at least close to the optimal rates. As a by-product, the assumed generalized Bernstein-type inequality also provides an interpretation of the so-called "effective number of observations" for various mixing processes.

Yunlong Feng

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

A Framework of Learning Through Empirical Gain Maximization

A Statistical Learning Approach to Modal Regression

Half-Quadratic Alternating Direction Method of Multipliers for Robust Orthogonal Tensor Approximation

New Insights into Learning with Correntropy Based Regression

Kernel Density Estimation for Dynamical Systems

Learning theory estimates with observations from general stationary stochastic processes