Source author record

Huan Xu

Huan Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

32works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

The fast pace of artificial intelligence~(AI) innovation demands an agile methodology for observation, reproduction and optimization of distributed machine learning~(ML) workload behavior in production AI systems and enables efficient software-hardware~(SW-HW) co-design for future systems. We present Chakra, an open and portable ecosystem for performance benchmarking and co-design. The core component of Chakra is an open and interoperable graph-based representation of distributed AI/ML workloads, called Chakra execution trace~(ET). These ETs represent key operations, such as compute, memory, and communication, data and control dependencies, timing, and resource constraints. Additionally, Chakra includes a complementary set of tools and capabilities to enable the collection, analysis, generation, and adoption of Chakra ETs by a broad range of simulators, emulators, and replay tools. We present analysis of Chakra ETs collected on production AI clusters and demonstrate value via real-world case studies. Chakra has been adopted by MLCommons and has active contributions and engagement across the industry, including but not limited to NVIDIA, AMD, Meta, Keysight, HPE, and Scala, to name a few.

preprint2022arXiv

Time Series Data Augmentation for Deep Learning: A Survey

Deep learning performs remarkably well on many time series analysis tasks recently. The superior performance of deep neural networks relies heavily on a large number of training data to avoid overfitting. However, the labeled data of many real-world time series applications may be limited such as classification in medical time series and anomaly detection in AIOps. As an effective way to enhance the size and quality of the training data, data augmentation is crucial to the successful application of deep learning models on time series data. In this paper, we systematically review different data augmentation methods for time series. We propose a taxonomy for the reviewed methods, and then provide a structured review for these methods by highlighting their strengths and limitations. We also empirically compare different data augmentation methods for different tasks including time series classification, anomaly detection, and forecasting. Finally, we discuss and highlight five future directions to provide useful research guidance.

preprint2021arXiv

Adversaries in Online Learning Revisited: with applications in Robust Optimization and Adversarial training

We revisit the concept of "adversary" in online learning, motivated by solving robust optimization and adversarial training using online learning methods. While one of the classical setups in online learning deals with the "adversarial" setup, it appears that this concept is used less rigorously, causing confusion in applying results and insights from online learning. Specifically, there are two fundamentally different types of adversaries, depending on whether the "adversary" is able to anticipate the exogenous randomness of the online learning algorithms. This is particularly relevant to robust optimization and adversarial training because the adversarial sequences are often anticipative, and many online learning algorithms do not achieve diminishing regret in such a case. We then apply this to solving robust optimization problems or (equivalently) adversarial training problems via online learning and establish a general approach for a large variety of problem classes using imaginary play. Here two players play against each other, the primal player playing the decisions and the dual player playing realizations of uncertain data. When the game terminates, the primal player has obtained an approximately robust solution. This meta-game allows for solving a large variety of robust optimization and multi-objective optimization problems and generalizes the approach of arXiv:1402.6361.

preprint2021arXiv

Bayesian Pool-based Active Learning With Abstention Feedbacks

We study pool-based active learning with abstention feedbacks, where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These are achieved by simply incorporating the estimated abstention rate into the greedy criteria. We prove that both of our algorithms have near-optimality guarantees: they respectively achieve a ${(1-\frac{1}{e})}$ constant factor approximation of the optimal expected or worst-case value of a useful utility function. Our experiments show the algorithms perform well in various practical scenarios.

preprint2021arXiv

RobustPeriod: Time-Frequency Mining for Robust Multiple Periodicity Detection

Periodicity detection is a crucial step in time series tasks, including monitoring and forecasting of metrics in many areas, such as IoT applications and self-driving database management system. In many of these applications, multiple periodic components exist and are often interlaced with each other. Such dynamic and complicated periodic patterns make the accurate periodicity detection difficult. In addition, other components in the time series, such as trend, outliers and noises, also pose additional challenges for accurate periodicity detection. In this paper, we propose a robust and general framework for multiple periodicity detection. Our algorithm applies maximal overlap discrete wavelet transform to transform the time series into multiple temporal-frequency scales such that different periodic components can be isolated. We rank them by wavelet variance, and then at each scale detect single periodicity by our proposed Huber-periodogram and Huber-ACF robustly. We rigorously prove the theoretical properties of Huber-periodogram and justify the use of Fisher's test on Huber-periodogram for periodicity detection. To further refine the detected periods, we compute unbiased autocorrelation function based on Wiener-Khinchin theorem from Huber-periodogram for improved robustness and efficiency. Experiments on synthetic and real-world datasets show that our algorithm outperforms other popular ones for both single and multiple periodicity detection.

preprint2020arXiv

Bayesian Active Learning With Abstention Feedbacks

We study pool-based active learning with abstention feedbacks where a labeler can abstain from labeling a queried example with some unknown abstention rate. This is an important problem with many useful applications. We take a Bayesian approach to the problem and develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These are achieved by simply incorporating the estimated average abstention rate into the greedy criteria. We prove that both algorithms have near-optimality guarantees: they respectively achieve a ${(1-\frac{1}{e})}$ constant factor approximation of the optimal expected or worst-case value of a useful utility function. Our experiments show the algorithms perform well in various practical scenarios.

preprint2020arXiv

Maximizing Cumulative User Engagement in Sequential Recommendation: An Online Optimization Perspective

To maximize cumulative user engagement (e.g. cumulative clicks) in sequential recommendation, it is often needed to tradeoff two potentially conflicting objectives, that is, pursuing higher immediate user engagement (e.g., click-through rate) and encouraging user browsing (i.e., more items exposured). Existing works often study these two tasks separately, thus tend to result in sub-optimal results. In this paper, we study this problem from an online optimization perspective, and propose a flexible and practical framework to explicitly tradeoff longer user browsing length and high immediate user engagement. Specifically, by considering items as actions, user's requests as states and user leaving as an absorbing state, we formulate each user's behavior as a personalized Markov decision process (MDP), and the problem of maximizing cumulative user engagement is reduced to a stochastic shortest path (SSP) problem. Meanwhile, with immediate user engagement and quit probability estimation, it is shown that the SSP problem can be efficiently solved via dynamic programming. Experiments on real-world datasets demonstrate the effectiveness of the proposed approach. Moreover, this approach is deployed at a large E-commerce platform, achieved over 7% improvement of cumulative clicks.

preprint2020arXiv

Multi-Agent Coverage in Urban Environments

We study multi-agent coverage algorithms for autonomous monitoring and patrol in urban environments. We consider scenarios in which a team of flying agents uses downward facing cameras (or similar sensors) to observe the environment outside of buildings at street-level. Buildings are considered obstacles that impede movement, and cameras are assumed to be ineffective above a maximum altitude. We study multi-agent urban coverage problems related to this scenario, including: (1) static multi-agent urban coverage, in which agents are expected to observe the environment from static locations, and (2) dynamic multi-agent urban coverage where agents move continuously through the environment. We experimentally evaluate six different multi-agent coverage methods, including: three types of ergodic coverage (that avoid buildings in different ways), lawn-mower sweep, voronoi region based control, and a naive grid method. We evaluate all algorithms with respect to four performance metrics (percent coverage, revist count, revist time, and the integral of area viewed over time), across four types of urban environments [low density, high density] x [short buildings, tall buildings], and for team sizes ranging from 2 to 25 agents. We believe this is the first extensive comparison of these methods in an urban setting. Our results highlight how the relative performance of static and dynamic methods changes based on the ratio of team size to search area, as well the relative effects that different characteristics of urban environments (tall, short, dense, sparse, mixed) have on each algorithm.

preprint2020arXiv

Structure/property relationship of semi-crystalline polymer during tensile deformation: A molecular dynamics approach

A coarse-grained molecular dynamics model of linear polyethylene-like polymer chain system was built to investigate the responds of structure and mechanical properties during uniaxial deformation. The influence of chain length, temperature, and strain rate were studied. The molecular dynamic tests showed that yielding may governed by different mechanisms at temperatures above and below Tg. Melt-recrystallization was observed at higher temperature, and destruction of crystal structures was observed at lower temperatures beyond yield point. While the higher temperature and lower strain rate have similar effects on mechanical properties. The correlated influences of time and temperature in the microscopic structures are more complicated. The evolution of microscopic characteristics such as the orientation parameter, the bond length, and the content of trans-trans conformation were calculated from the simulation. The results showed that the temperature have double effects on polymer chains. Higher temperature on one hand makes the chains more flexible, while on the other hand shortens the relaxation time of polymers. It is the interaction of these two aspects that determine the orientation parameter. During deformation, the trans conformation has experienced a rising process after the first drop process. And these microscopic structure parameters exhibit critical transaction, which are closely related to the yield point. A hypothetical model was thus proposed to describe the micro-structure and property relations based the investigations of this study.

preprint2020arXiv

The Online Saddle Point Problem and Online Convex Optimization with Knapsacks

We study the online saddle point problem, an online learning problem where at each iteration a pair of actions need to be chosen without knowledge of the current and future (convex-concave) payoff functions. The objective is to minimize the gap between the cumulative payoffs and the saddle point value of the aggregate payoff function, which we measure using a metric called "SP-Regret". The problem generalizes the online convex optimization framework but here we must ensure both players incur cumulative payoffs close to that of the Nash equilibrium of the sum of the games. We propose an algorithm that achieves SP-Regret proportional to $\sqrt{\ln(T)T}$ in the general case, and $\log(T)$ SP-Regret for the strongly convex-concave case. We also consider the special case where the payoff functions are bilinear and the decision sets are the probability simplex. In this setting we are able to design algorithms that reduce the bounds on SP-Regret from a linear dependence in the dimension of the problem to a \textit{logarithmic} one. We also study the problem under bandit feedback and provide an algorithm that achieves sublinear SP-Regret. We then consider an online convex optimization with knapsacks problem motivated by a wide variety of applications such as: dynamic pricing, auctions, and crowdsourcing. We relate this problem to the online saddle point problem and establish $O(\sqrt{T})$ regret using a primal-dual algorithm.

preprint2019arXiv

Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs

We study the problem of repeated play in a zero-sum game in which the payoff matrix may change, in a possibly adversarial fashion, on each round; we call these Online Matrix Games. Finding the Nash Equilibrium (NE) of a two player zero-sum game is core to many problems in statistics, optimization, and economics, and for a fixed game matrix this can be easily reduced to solving a linear program. But when the payoff matrix evolves over time our goal is to find a sequential algorithm that can compete with, in a certain sense, the NE of the long-term-averaged payoff matrix. We design an algorithm with small NE regret--that is, we ensure that the long-term payoff of both players is close to minimax optimum in hindsight. Our algorithm achieves near-optimal dependence with respect to the number of rounds and depends poly-logarithmically on the number of available actions of the players. Additionally, we show that the naive reduction, where each player simply minimizes its own regret, fails to achieve the stated objective regardless of which algorithm is used. We also consider the so-called bandit setting, where the feedback is significantly limited, and we provide an algorithm with small NE regret using one-point estimates of each payoff matrix.

preprint2017arXiv

Outlier Robust Online Learning

We consider the problem of learning from noisy data in practical settings where the size of data is too large to store on a single machine. More challenging, the data coming from the wild may contain malicious outliers. To address the scalability and robustness issues, we present an online robust learning (ORL) approach. ORL is simple to implement and has provable robustness guarantee -- in stark contrast to existing online learning approaches that are generally fragile to outliers. We specialize the ORL approach for two concrete cases: online robust principal component analysis and online linear regression. We demonstrate the efficiency and robustness advantages of ORL through comprehensive simulations and predicting image tags on a large-scale data set. We also discuss extension of the ORL to distributed learning and provide experimental evaluations.

preprint2016arXiv

Matrix completion with column manipulation: Near-optimal sample-robustness-rank tradeoffs

This paper considers the problem of matrix completion when some number of the columns are completely and arbitrarily corrupted, potentially by a malicious adversary. It is well-known that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. One direct application comes from robust collaborative filtering. Here, some number of users are so-called manipulators who try to skew the predictions of the algorithm by calibrating their inputs to the system. In this paper, we develop an efficient algorithm for this problem based on a combination of a trimming procedure and a convex program that minimizes the nuclear norm and the $\ell_{1,2}$ norm. Our theoretical results show that given a vanishing fraction of observed entries, it is nevertheless possible to complete the underlying matrix even when the number of corrupted columns grows. Significantly, our results hold without any assumptions on the locations or values of the observed entries of the manipulated columns. Moreover, we show by an information-theoretic argument that our guarantees are nearly optimal in terms of the fraction of sampled entries on the authentic columns, the fraction of corrupted columns, and the rank of the underlying matrix. Our results therefore sharply characterize the tradeoffs between sample, robustness and rank in matrix completion.

preprint2016arXiv

Online Nonnegative Matrix Factorization with General Divergences

We develop a unified and systematic framework for performing online nonnegative matrix factorization under a wide variety of important divergences. The online nature of our algorithm makes it particularly amenable to large-scale data. We prove that the sequence of learned dictionaries converges almost surely to the set of critical points of the expected loss function. We do so by leveraging the theory of stochastic approximations and projected dynamical systems. This result substantially generalizes the previous results obtained only for the squared-$\ell_2$ loss. Moreover, the novel techniques involved in our analysis open new avenues for analyzing similar matrix factorization problems. The computational efficiency and the quality of the learned dictionary of our algorithm are verified empirically on both synthetic and real datasets. In particular, on the tasks of topic learning, shadow removal and image denoising, our algorithm achieves superior trade-offs between the quality of learned dictionary and running time over the batch and other online NMF algorithms.

preprint2016arXiv

Online Optimization for Large-Scale Max-Norm Regularization

Max-norm regularizer has been extensively studied in the last decade as it promotes an effective low-rank estimation for the underlying data. However, such max-norm regularized problems are typically formulated and solved in a batch manner, which prevents it from processing big data due to possible memory budget. In this paper, hence, we propose an online algorithm that is scalable to large-scale setting. Particularly, we consider the matrix decomposition problem as an example, although a simple variant of the algorithm and analysis can be adapted to other important problems such as matrix completion. The crucial technique in our implementation is to reformulating the max-norm to an equivalent matrix factorization form, where the factors consist of a (possibly overcomplete) basis component and a coefficients one. In this way, we may maintain the basis component in the memory and optimize over it and the coefficients for each sample alternatively. Since the memory footprint of the basis component is independent of the sample size, our algorithm is appealing when manipulating a large collection of samples. We prove that the sequence of the solutions (i.e., the basis component) produced by our algorithm converges to a stationary point of the expected loss function asymptotically. Numerical study demonstrates encouraging results for the efficacy and robustness of our algorithm compared to the widely used nuclear norm solvers.

preprint2015arXiv

Central-limit approach to risk-aware Markov decision processes

Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a long-enough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.

preprint2015arXiv

Distributed Robust Learning

We propose a framework for distributed robust statistical learning on {\em big contaminated data}. The Distributed Robust Learning (DRL) framework can reduce the computational time of traditional robust learning methods by several orders of magnitude. We analyze the robustness property of DRL, showing that DRL not only preserves the robustness of the base robust learning method, but also tolerates contaminations on a constant fraction of results from computing nodes (node failures). More precisely, even in presence of the most adversarial outlier distribution over computing nodes, DRL still achieves a breakdown point of at least $ λ^*/2 $, where $ λ^* $ is the break down point of corresponding centralized algorithm. This is in stark contrast with naive division-and-averaging implementation, which may reduce the breakdown point by a factor of $ k $ when $ k $ computing nodes are used. We then specialize the DRL framework for two concrete cases: distributed robust principal component analysis and distributed robust regression. We demonstrate the efficiency and the robustness advantages of DRL through comprehensive simulations and predicting image tags on a large-scale image set.

preprint2015arXiv

Distributionally Robust Counterpart in Markov Decision Processes

This paper studies Markov Decision Processes under parameter uncertainty. We adapt the distributionally robust optimization framework, and assume that the uncertain parameters are random variables following an unknown distribution, and seeks the strategy which maximizes the expected performance under the most adversarial distribution. In particular, we generalize previous study \cite{xu2012distributionally} which concentrates on distribution sets with very special structure to much more generic class of distribution sets, and show that the optimal strategy can be obtained efficiently under mild technical condition. This significantly extends the applicability of distributionally robust MDP to incorporate probabilistic information of uncertainty in a more flexible way.

preprint2015arXiv

Improved Graph Clustering

Graph clustering involves the task of dividing nodes into clusters, so that the edge density is higher within clusters as opposed to across clusters. A natural, classic and popular statistical setting for evaluating solutions to this problem is the stochastic block model, also referred to as the planted partition model. In this paper we present a new algorithm--a convexified version of Maximum Likelihood--for graph clustering. We show that, in the classic stochastic block model setting, it outperforms existing methods by polynomial factors when the cluster size is allowed to have general scalings. In fact, it is within logarithmic factors of known lower bounds for spectral methods, and there is evidence suggesting that no polynomial time algorithm would do significantly better. We then show that this guarantee carries over to a more general extension of the stochastic block model. Our method can handle the settings of semi-random graphs, heterogeneous degree distributions, unequal cluster sizes, unaffiliated nodes, partially observed graphs and planted clique/coloring etc. In particular, our results provide the best exact recovery guarantees to date for the planted partition, planted k-disjoint-cliques and planted noisy coloring models with general cluster sizes; in other settings, we match the best existing results up to logarithmic factors.

preprint2015arXiv

Noisy Sparse Subspace Clustering

This paper considers the problem of subspace clustering under noise. Specifically, we study the behavior of Sparse Subspace Clustering (SSC) when either adversarial or random noise is added to the unlabelled input data points, which are assumed to be in a union of low-dimensional subspaces. We show that a modified version of SSC is \emph{provably effective} in correctly identifying the underlying subspaces, even with noisy data. This extends theoretical guarantee of this algorithm to more practical settings and provides justification to the success of SSC in a class of real applications.

preprint2015arXiv

On the Well-posedness of 2-D Incompressible Navier-Stokes Equations with Variable Viscosity in Critical Spaces

In this paper, we first prove the local well-posedness of the 2-D incompressible Navier-Stokes equations with variable viscosity in critical Besov spaces with negative regularity indices, without smallness assumption on the variation of the density. The key is to prove for $p\in(1,4)$ and $a\in\dot{B}_{p,1}^{\frac2p}(\mathbb{R}^2)$ that the solution mapping $\mathcal{H}_a:F\mapsto\nablaΠ$ to the 2-D elliptic equation $\mathrm{div}\big((1+a)\nablaΠ\big)=\mathrm{div} F$ is bounded on $\dot{B}_{p,1}^{\frac2p-1}(\mathbb{R}^2)$. More precisely, we prove that $$\|\nablaΠ\|_{\dot{B}_{p,1}^{\frac2p-1}}\leq C\big(1+\|a\|_{\dot{B}_{p,1}^{\frac2p}}\big)^2\|F\|_{\dot{B}_{p,1}^{\frac2p-1}}.$$ The proof of the uniqueness of solution to (1.2) relies on a Lagrangian approach [15]-[17]. When the viscosity coefficient $μ(ρ)$ is a positive constant, we prove that (1.2) is globally well-posed.

preprint2015arXiv

Social Trust Prediction via Max-norm Constrained 1-bit Matrix Completion

Social trust prediction addresses the significant problem of exploring interactions among users in social networks. Naturally, this problem can be formulated in the matrix completion framework, with each entry indicating the trustness or distrustness. However, there are two challenges for the social trust problem: 1) the observed data are with sign (1-bit) measurements; 2) they are typically sampled non-uniformly. Most of the previous matrix completion methods do not well handle the two issues. Motivated by the recent progress of max-norm, we propose to solve the problem with a 1-bit max-norm constrained formulation. Since max-norm is not easy to optimize, we utilize a reformulation of max-norm which facilitates an efficient projected gradient decent algorithm. We demonstrate the superiority of our formulation on two benchmark datasets.

preprint2014arXiv

Clustering Partially Observed Graphs via Convex Optimization

This paper considers the problem of clustering a partially observed unweighted graph---i.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"---i.e., the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) low-rank matrix and an (unknown) sparse matrix from their partially observed sum. We evaluate the performance of our algorithm on the classical Planted Partition/Stochastic Block Model. Our main theorem provides sufficient conditions for the success of our algorithm as a function of the minimum cluster size, edge density and observation probability; in particular, the results characterize the tradeoff between the observation probability and the edge density gap. When there are a constant number of clusters of equal size, our results are optimal up to logarithmic factors.

preprint2014arXiv

Exact Subspace Segmentation and Outlier Detection by Low-Rank Representation

In this work, we address the following matrix recovery problem: suppose we are given a set of data points containing two parts, one part consists of samples drawn from a union of multiple subspaces and the other part consists of outliers. We do not know which data points are outliers, or how many outliers there are. The rank and number of the subspaces are unknown either. Can we detect the outliers and segment the samples into their right subspaces, efficiently and exactly? We utilize a so-called {\em Low-Rank Representation} (LRR) method to solve this problem, and prove that under mild technical conditions, any solution to LRR exactly recovers the row space of the samples and detect the outliers as well. Since the subspace membership is provably determined by the row space, this further implies that LRR can perform exact subspace segmentation and outlier detection, in an efficient way.

preprint2013arXiv

Scaling Up Robust MDPs by Reinforcement Learning

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm. Previous studies showed that robust MDPs, based on a minimax approach to handle uncertainty, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically prohibitively large for such approaches. In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs. We show that the proposed method provably succeeds under certain technical conditions, and demonstrate its effectiveness through simulation of an option pricing problem. To the best of our knowledge, this is the first attempt to scale up the robust MDPs paradigm.

preprint2012arXiv

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty

We consider Markov decision processes under parameter uncertainty. Previous studies all restrict to the case that uncertainties among different states are uncoupled, which leads to conservative solutions. In contrast, we introduce an intuitive concept, termed "Lightning Does not Strike Twice," to model coupled uncertain parameters. Specifically, we require that the system can deviate from its nominal parameters only a bounded number of times. We give probabilistic guarantees indicating that this model represents real life situations and devise tractable algorithms for computing optimal control policies using this concept.

preprint2012arXiv

Robust PCA in High-dimension: A Deterministic Approach

We consider principal component analysis for contaminated data-set in the high dimensional regime, where the dimensionality of each observation is comparable or even more than the number of observations. We propose a deterministic high-dimensional robust PCA algorithm which inherits all theoretical properties of its randomized counterpart, i.e., it is tractable, robust to contaminated points, easily kernelizable, asymptotic consistent and achieves maximal robustness -- a breakdown point of 50%. More importantly, the proposed method exhibits significantly better computational efficiency, which makes it suitable for large-scale real applications.

preprint2012arXiv

Stability of matrix factorization for collaborative filtering

We study the stability vis a vis adversarial noise of matrix factorization algorithm for matrix completion. In particular, our results include: (I) we bound the gap between the solution matrix of the factorization method and the ground truth in terms of root mean square error; (II) we treat the matrix factorization as a subspace fitting problem and analyze the difference between the solution subspace and the ground truth; (III) we analyze the prediction error of individual users based on the subspace stability. We apply these results to the problem of collaborative filtering under manipulator attack, which leads to useful insights and guidelines for collaborative filtering system design.

preprint2010arXiv

Principal Component Analysis with Contaminated Data: The High Dimensional Case

We consider the dimensionality-reduction problem (finding a subspace approximation of observed data) for contaminated data in the high dimensional regime, where the number of observations is of the same magnitude as the number of variables of each observation, and the data set contains some (arbitrarily) corrupted observations. We propose a High-dimensional Robust Principal Component Analysis (HR-PCA) algorithm that is tractable, robust to contaminated points, and easily kernelizable. The resulting subspace has a bounded deviation from the desired one, achieves maximal robustness -- a breakdown point of 50% while all existing algorithms have a breakdown point of zero, and unlike ordinary PCA algorithms, achieves optimality in the limit case where the proportion of corrupted points goes to zero.

preprint2010arXiv

Robust PCA via Outlier Pursuit

Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself. In any problem where one seeks to recover a structure rather than the exact initial matrices, techniques developed thus far relying on certificates of optimality, will fail. We present an important extension of these methods, that allows the treatment of such problems.

preprint2010arXiv

Robustness and Generalization

We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is "similar" to a training sample, then the testing error is close to the training error. This provides a novel approach, different from the complexity or stability arguments, to study generalization of learning algorithms. We further show that a weak notion of robustness is both sufficient and necessary for generalizability, which implies that robustness is a fundamental property for learning algorithms to work.

preprint2008arXiv

Robustness and Regularization of Support Vector Machines

We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization, provides a robust optimization interpretation for the success of regularized SVMs. We use the this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well.

Huan Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

32 published item(s)

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Time Series Data Augmentation for Deep Learning: A Survey

Adversaries in Online Learning Revisited: with applications in Robust Optimization and Adversarial training

Bayesian Pool-based Active Learning With Abstention Feedbacks

RobustPeriod: Time-Frequency Mining for Robust Multiple Periodicity Detection

Bayesian Active Learning With Abstention Feedbacks

Maximizing Cumulative User Engagement in Sequential Recommendation: An Online Optimization Perspective

Multi-Agent Coverage in Urban Environments

Structure/property relationship of semi-crystalline polymer during tensile deformation: A molecular dynamics approach

The Online Saddle Point Problem and Online Convex Optimization with Knapsacks

Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs

Outlier Robust Online Learning

Matrix completion with column manipulation: Near-optimal sample-robustness-rank tradeoffs

Online Nonnegative Matrix Factorization with General Divergences

Online Optimization for Large-Scale Max-Norm Regularization

Central-limit approach to risk-aware Markov decision processes

Distributed Robust Learning

Distributionally Robust Counterpart in Markov Decision Processes

Improved Graph Clustering

Noisy Sparse Subspace Clustering

On the Well-posedness of 2-D Incompressible Navier-Stokes Equations with Variable Viscosity in Critical Spaces

Social Trust Prediction via Max-norm Constrained 1-bit Matrix Completion

Clustering Partially Observed Graphs via Convex Optimization

Exact Subspace Segmentation and Outlier Detection by Low-Rank Representation

Scaling Up Robust MDPs by Reinforcement Learning

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty

Robust PCA in High-dimension: A Deterministic Approach

Stability of matrix factorization for collaborative filtering

Principal Component Analysis with Contaminated Data: The High Dimensional Case

Robust PCA via Outlier Pursuit

Robustness and Generalization

Robustness and Regularization of Support Vector Machines