Research connected to "machine learning"

Search papers, authors, topics, institutions and opportunities, then move straight into the graph around the result.

FiltersOptional

Search results

Showing works 97-128 from 49,008 works in Machine Learning. Use pages to browse more, or open the graph for the map.

49,008matching works
Full topic scaleMachine Learning

49,008 works and 109,744 authors are indexed for this topic. This page shows 32 works at a time so search stays fast.

Match modeExact match focus
Semantic hits0
Active filters0
Graph viewOpen

Papers

preprint2013arXiv

ParceLiNGAM: A causal ordering method robust against latent confounders

We consider learning a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the estimation results could be distorted if some assumptions actually are violated. In this paper, we propose a new algorithm for learning causal orders that is robust against one typical violation of the model assumptions: latent confounders. The key idea is to detect latent confounders by testing independence between estimated external influences and find subsets (parcels) that include variables that are not affected by latent confounders. We demonstrate the effectiveness of our method using artificial data and simulated brain imaging data.

preprint2020arXiv

Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons

In reinforcement learning, the discount factor $γ$ controls the agent's effective planning horizon. Traditionally, this parameter was considered part of the MDP; however, as deep reinforcement learning algorithms tend to become unstable when the effective planning horizon is long, recent works refer to $γ$ as a hyper-parameter -- thus changing the underlying MDP and potentially leading the agent towards sub-optimal behavior on the original task. In this work, we introduce \emph{reward tweaking}. Reward tweaking learns a surrogate reward function $\tilde r$ for the discounted setting that induces optimal behavior on the original finite-horizon total reward task. Theoretically, we show that there exists a surrogate reward that leads to optimality in the original task and discuss the robustness of our approach. Additionally, we perform experiments in high-dimensional continuous control tasks and show that reward tweaking guides the agent towards better long-horizon returns although it plans for short horizons.

preprint2020arXiv

Using an expert deviation carrying the knowledge of climate data in usual clustering algorithms

In order to help physicists to expand their knowledge of the climate in the Lesser Antilles, we aim to identify the spatio-temporal configurations using clustering analysis on wind speed and cumulative rainfall datasets. But we show that using the L2 norm in conventional clustering methods as K-Means (KMS) and Hierarchical Agglomerative Clustering (HAC) can induce undesirable effects. So, we propose to replace Euclidean distance (L2) by a dissimilarity measure named Expert Deviation (ED). Based on the symmetrized Kullback-Leibler divergence, the ED integrates the properties of the observed physical parameters and climate knowledge. This measure helps comparing histograms of four patches, corresponding to geographical zones, that are influenced by atmospheric structures. The combined evaluation of the internal homogeneity and the separation of the clusters obtained using ED and L2 was performed. The results, which are compared using the silhouette index, show five clusters with high indexes. For the two available datasets one can see that, unlike KMS-L2, KMS-ED discriminates the daily situations favorably, giving more physical meaning to the clusters discovered by the algorithm. The

preprint2026arXiv

Eliciting associations between clinical variables from LLMs via comparison questions across populations

The training data of large language models (LLMs) comprises a wide range of biomedical literature, reflecting data from many different patient populations. We investigate how it might be possible to recover information on correlation and causal links between patient characteristics, as a key building block for medical decision making. To avoid the pitfalls of direct elicitation, we propose an approach based on structured comparison questions, specifically patient comparison triplet questions. This is combined with a statistical model for the LLM representation that provides estimates of correlations without access to activations or model internals. Intuitively, we consider how similarity decisions of LLMs based on a first variable are affected by providing information on a second variable for one of the patients being assessed. We then induce prompt-level environment shifts to obtain correlation estimates for different subpopulations, which enables an invariant causal prediction (ICP) approach to obtain conservative candidate parent links. We demonstrate the method in two clinical domains, chronic obstructive pulmonary disease (COPD) and multiple sclerosis (MS). Across prompted environments, the elicited correlations are smooth, stable, and clinically interpretable, yet vary in a statistically significant way that supports downstream invariance testing, such that ICP provides a small set of candidate invariant parent links. These results show that indirect elicitation via triplet comparisons can recover meaningful association structure from LLMs and offer a cautious route from implicit correlations to causal statements that are congruent with LLM answering patterns.

preprint2013arXiv

Bayesian Consensus Clustering

The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source separately. This work is motivated by the integrated analysis of heterogeneous biomedical data, and we present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. Software is available at http://people.duke.edu/~el113/software.html.

preprint2016arXiv

Algorithms for Fitting the Constrained Lasso

We compare alternative computing strategies for solving the constrained lasso problem. As its name suggests, the constrained lasso extends the widely-used lasso to handle linear constraints, which allow the user to incorporate prior information into the model. In addition to quadratic programming, we employ the alternating direction method of multipliers (ADMM) and also derive an efficient solution path algorithm. Through both simulations and real data examples, we compare the different algorithms and provide practical recommendations in terms of efficiency and accuracy for various sizes of data. We also show that, for an arbitrary penalty matrix, the generalized lasso can be transformed to a constrained lasso, while the converse is not true. Thus, our methods can also be used for estimating a generalized lasso, which has wide-ranging applications. Code for implementing the algorithms is freely available in the Matlab toolbox SparseReg.

preprint2020arXiv

Neural Contextual Bandits with UCB-based Exploration

We study the stochastic contextual bandit problem, where the reward is generated from an unknown function with additive noise. No assumption is made about the reward function other than boundedness. We propose a new algorithm, NeuralUCB, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. We prove that, under standard assumptions, NeuralUCB achieves $\tilde O(\sqrt{T})$ regret, where $T$ is the number of rounds. To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee. We also show the algorithm is empirically competitive against representative baselines in a number of benchmarks.

preprint2023arXiv

Automatic Generation of Product Concepts from Positive Examples, with an Application to Music Streaming

Internet based businesses and products (e.g. e-commerce, music streaming) are becoming more and more sophisticated every day with a lot of focus on improving customer satisfaction. A core way they achieve this is by providing customers with an easy access to their products by structuring them in catalogues using navigation bars and providing recommendations. We refer to these catalogues as product concepts, e.g. product categories on e-commerce websites, public playlists on music streaming platforms. These product concepts typically contain products that are linked with each other through some common features (e.g. a playlist of songs by the same artist). How they are defined in the backend of the system can be different for different products. In this work, we represent product concepts using database queries and tackle two learning problems. First, given sets of products that all belong to the same unknown product concept, we learn a database query that is a representation of this product concept. Second, we learn product concepts and their corresponding queries when the given sets of products are associated with multiple product concepts. To achieve these goals, we propose two approaches that combine the concepts of PU learning with Decision Trees and Clustering. Our experiments demonstrate, via a simulated setup for a music streaming service, that our approach is effective in solving these problems.

preprint2022arXiv

On the Power-Law Hessian Spectrums in Deep Learning

It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the theoretical mechanism or the mathematical behind the Hessian spectrum is still largely under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectrums of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physical theories and the spectral analysis of natural proteins, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exist and suggest a spectral parallel between protein evolution and training of deep neural networks. By conducing extensive experiments, we further use the power-law spectral framework as a useful tool to explore multiple novel behaviors of deep learning.

preprint2022arXiv

Stochastic Coded Federated Learning with Convergence and Privacy Guarantees

Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework, where many clients collaboratively train a machine learning model by exchanging model updates with a parameter server instead of sharing their raw data. Nevertheless, FL training suffers from slow convergence and unstable performance due to stragglers caused by the heterogeneous computational resources of clients and fluctuating communication rates. This paper proposes a coded FL framework to mitigate the straggler issue, namely stochastic coded federated learning (SCFL). In this framework, each client generates a privacy-preserving coded dataset by adding additive noise to the random linear combination of its local data. The server collects the coded datasets from all the clients to construct a composite dataset, which helps to compensate for the straggling effect. In the training process, the server as well as clients perform mini-batch stochastic gradient descent (SGD), and the server adds a make-up term in model aggregation to obtain unbiased gradient estimates. We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning. Besides, we demonstrate a privacy-performance tradeoff of the proposed SCFL method by analyzing the influence of the privacy constraint on the convergence rate. Finally, numerical experiments corroborate our analysis and show the benefits of SCFL in achieving fast convergence while preserving data privacy.

preprint2013arXiv

Spectral Experts for Estimating Mixtures of Linear Regressions

Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latent-variable model. Our approach relies on a low-rank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of convergence for our estimator and provide an empirical evaluation illustrating its strengths relative to local optimization (EM).

preprint2022arXiv

Inductive Bias of Multi-Channel Linear Convolutional Networks with Bounded Weight Norm

We provide a function space characterization of the inductive bias resulting from minimizing the $\ell_2$ norm of the weights in multi-channel convolutional neural networks with linear activations and empirically test our resulting hypothesis on ReLU networks trained using gradient descent. We define an induced regularizer in the function space as the minimum $\ell_2$ norm of weights of a network required to realize a function. For two layer linear convolutional networks with $C$ output channels and kernel size $K$, we show the following: (a) If the inputs to the network are single channeled, the induced regularizer for any $K$ is independent of the number of output channels $C$. Furthermore, we derive the regularizer is a norm given by a semidefinite program (SDP). (b) In contrast, for multi-channel inputs, multiple output channels can be necessary to merely realize all matrix-valued linear functions and thus the inductive bias does depend on $C$. However, for sufficiently large $C$, the induced regularizer is again given by an SDP that is independent of $C$. In particular, the induced regularizer for $K=1$ and $K=D$ (input dimension) is given in closed form as the nuclear norm and the $\ell_{2,1}$ group-sparse norm, respectively, of the Fourier coefficients of the linear predictor. We investigate the broader applicability of our theoretical results to implicit regularization from gradient descent on linear and ReLU networks through experiments on MNIST and CIFAR-10 datasets.

preprint2014arXiv

How transferable are features in deep neural networks?

Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transf

preprint2022arXiv

Guided Diffusion Model for Adversarial Purification from Random Noise

In this paper, we propose a novel guided diffusion purification approach to provide a strong defense against adversarial attacks. Our model achieves 89.62% robust accuracy under PGD-L_inf attack (eps = 8/255) on the CIFAR-10 dataset. We first explore the essential correlations between unguided diffusion models and randomized smoothing, enabling us to apply the models to certified robustness. The empirical results show that our models outperform randomized smoothing by 5% when the certified L2 radius r is larger than 0.5.

preprint2026arXiv

Robust Representation Learning through Explicit Environment Modeling

We consider learning from labeled data collected across multiple environments, where the data distribution may vary across these environments. This problem is commonly approached from a causal perspective, seeking invariant representations that retain causal factors while discarding spurious ones. However, this framework assumes that the environment has no direct effect on the target. In contrast, we consider settings in which this assumption fails, but still aim to learn representations that support robust prediction on average across previously unseen environments. To this end, we study representations learned by explicitly modeling variation across environments and then marginalizing that variation out. We analyze the resulting representations and characterize when they are preferable to those learned by causal invariant-representation methods. We propose a concrete method based on generalized random-intercept models, a class of predictors in which such marginalization is possible, and study their generalization properties. Empirically, we show that these models outperform invariant-learning methods across a range of challenging settings.

preprint2021arXiv

Robust Classification using Hidden Markov Models and Mixtures of Normalizing Flows

We test the robustness of a maximum-likelihood (ML) based classifier where sequential data as observation is corrupted by noise. The hypothesis is that a generative model, that combines the state transitions of a hidden Markov model (HMM) and the neural network based probability distributions for the hidden states of the HMM, can provide a robust classification performance. The combined model is called normalizing-flow mixture model based HMM (NMM-HMM). It can be trained using a combination of expectation-maximization (EM) and backpropagation. We verify the improved robustness of NMM-HMM classifiers in an application to speech recognition.

preprint2020arXiv

Bonseyes AI Pipeline -- bringing AI to you. End-to-end integration of data, algorithms and deployment tools

Next generation of embedded Information and Communication Technology (ICT) systems are collaborative systems able to perform autonomous tasks. The remarkable expansion of the embedded ICT market, together with the rise and breakthroughs of Artificial Intelligence (AI), have put the focus on the Edge as it stands as one of the keys for the next technological revolution: the seamless integration of AI in our daily life. However, training and deployment of custom AI solutions on embedded devices require a fine-grained integration of data, algorithms, and tools to achieve high accuracy. Such integration requires a high level of expertise that becomes a real bottleneck for small and medium enterprises wanting to deploy AI solutions on the Edge which, ultimately, slows down the adoption of AI on daily-life applications. In this work, we present a modular AI pipeline as an integrating framework to bring data, algorithms, and deployment tools together. By removing the integration barriers and lowering the required expertise, we can interconnect the different stages of tools and provide a modular end-to-end development of AI products for embedded devices. Our AI pipeline consists of four mo

preprint2026arXiv

From Actions to Words: Towards Abstractive-Textual Policy Summarization in RL

Explaining reinforcement learning agents is challenging because policies emerge from complex reward structures and neural representations that are difficult for humans to interpret. Existing approaches often rely on curated demonstrations that expose local behaviors but provide limited insight into an agent's global strategy, leaving users to infer intent from raw observations. We propose SySLLM (Synthesized Summary using Large Language Models), a framework that reframes policy interpretation as a language-generation problem. Instead of visual demonstrations, SySLLM converts spatiotemporal trajectories into structured text and prompts an LLM to generate coherent summaries describing the agent's goals, exploration style, and decision patterns. SySLLM scales to long-horizon, semantically rich environments without task-specific fine-tuning, leveraging LLM world knowledge and compositional reasoning to capture latent behavioral structure across policies. Expert evaluations show strong alignment with human analyses, and a large-scale user study found that 75.5% of participants preferred SySLLM summaries over state-of-the-art demonstration-based explanations. Together, these resu

preprint2020arXiv

Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

One of the biggest issues in deep learning theory is the generalization ability of networks with huge model size. The classical learning theory suggests that overparameterized models cause overfitting. However, practically used large deep models avoid overfitting, which is not well explained by the classical approaches. To resolve this issue, several attempts have been made. Among them, the compression based bound is one of the promising approaches. However, the compression based bound can be applied only to a compressed network, and it is not applicable to the non-compressed original network. In this paper, we give a unified frame-work that can convert compression based bounds to those for non-compressed original networks. The bound gives even better rate than the one for the compressed network by improving the bias term. By establishing the unified frame-work, we can obtain a data dependent generalization error bound which gives a tighter evaluation than the data independent ones.

preprint2012arXiv

Parameter and Structure Learning in Nested Markov Models

The constraints arising from DAG models with latent variables can be naturally represented by means of acyclic directed mixed graphs (ADMGs). Such graphs contain directed and bidirected arrows, and contain no directed cycles. DAGs with latent variables imply independence constraints in the distribution resulting from a 'fixing' operation, in which a joint distribution is divided by a conditional. This operation generalizes marginalizing and conditioning. Some of these constraints correspond to identifiable 'dormant' independence constraints, with the well known 'Verma constraint' as one example. Recently, models defined by a set of the constraints arising after fixing from a DAG with latents, were characterized via a recursive factorization and a nested Markov property. In addition, a parameterization was given in the discrete case. In this paper we use this parameterization to describe a parameter fitting algorithm, and a search and score structure learning algorithm for these nested Markov models. We apply our algorithms to a variety of datasets.

preprint2020arXiv

Unbiased Deep Reinforcement Learning: A General Training Framework for Existing and Future Algorithms

In recent years deep neural networks have been successfully applied to the domains of reinforcement learning \cite{bengio2009learning,krizhevsky2012imagenet,hinton2006reducing}. Deep reinforcement learning \cite{mnih2015human} is reported to have the advantage of learning effective policies directly from high-dimensional sensory inputs over traditional agents. However, within the scope of the literature, there is no fundamental change or improvement on the existing training framework. Here we propose a novel training framework that is conceptually comprehensible and potentially easy to be generalized to all feasible algorithms for reinforcement learning. We employ Monte-carlo sampling to achieve raw data inputs, and train them in batch to achieve Markov decision process sequences and synchronously update the network parameters instead of experience replay. This training framework proves to optimize the unbiased approximation of loss function whose estimation exactly matches the real probability distribution data inputs follow, and thus have overwhelming advantages of sample efficiency and convergence rate over existing deep reinforcement learning after evaluating it on both discret

preprint2019arXiv

Dirichlet uncertainty wrappers for actionable algorithm accuracy accountability and auditability

Nowadays, the use of machine learning models is becoming a utility in many applications. Companies deliver pre-trained models encapsulated as application programming interfaces (APIs) that developers combine with third party components and their own models and data to create complex data products to solve specific problems. The complexity of such products and the lack of control and knowledge of the internals of each component used cause unavoidable effects, such as lack of transparency, difficulty in auditability, and emergence of potential uncontrolled risks. They are effectively black-boxes. Accountability of such solutions is a challenge for the auditors and the machine learning community. In this work, we propose a wrapper that given a black-box model enriches its output prediction with a measure of uncertainty. By using this wrapper, we make the black-box auditable for the accuracy risk (risk derived from low quality or uncertain decisions) and at the same time we provide an actionable mechanism to mitigate that risk in the form of decision rejection; we can choose not to issue a prediction when the risk or uncertainty in that decision is significant. Based on the resulting u

preprint2013arXiv

The Planning-ahead SMO Algorithm

The sequential minimal optimization (SMO) algorithm and variants thereof are the de facto standard method for solving large quadratic programs for support vector machine (SVM) training. In this paper we propose a simple yet powerful modification. The main emphasis is on an algorithm improving the SMO step size by planning-ahead. The theoretical analysis ensures its convergence to the optimum. Experiments involving a large number of datasets were carried out to demonstrate the superiority of the new algorithm.

preprint2013arXiv

Reinforcement Learning for the Soccer Dribbling Task

We propose a reinforcement learning solution to the \emph{soccer dribbling task}, a scenario in which a soccer agent has to go from the beginning to the end of a region keeping possession of the ball, as an adversary attempts to gain possession. While the adversary uses a stationary policy, the dribbler learns the best action to take at each decision point. After defining meaningful variables to represent the state space, and high-level macro-actions to incorporate domain knowledge, we describe our application of the reinforcement learning algorithm \emph{Sarsa} with CMAC for function approximation. Our experiments show that, after the training period, the dribbler is able to accomplish its task against a strong adversary around 58% of the time.

preprint2026arXiv

Controllable Concept Bottleneck Models

Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on static scenarios where the data and concepts are assumed to be fixed and clean. In real-world applications, deployed models require continuous maintenance: we often need to remove erroneous or sensitive data (unlearning), correct mislabeled concepts, or incorporate newly acquired samples (incremental learning) to adapt to evolving environments. Thus, deriving efficient editable CBMs without retraining from scratch remains a significant challenge, particularly in large-scale applications. To address these challenges, we propose Controllable Concept Bottleneck Models (CCBMs). Specifically, CCBMs support three granularities of model editing: concept-label-level, concept-level, and data-level, the latter of which encompasses both data removal and data addition. CCBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for retraining. Experimental results demonstrate the efficiency and adaptability of our CCBMs, affirming their

preprint2020arXiv

Tempered Sigmoid Activations for Deep Learning with Differential Privacy

Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to less than ideal privacy/utility tradeoffs, as we show here. Instead, we propose that model architectures are chosen ab initio explicitly for privacy-preserving training. To provide guarantees under the gold standard of differential privacy, one must bound as strictly as possible how individual training points can possibly affect model updates. In this paper, we are the first to observe that the choice of activation function is central to bounding the sensitivity of privacy-preserving deep learning. We demonstrate analytically and experimentally how a general family of bounded activation functions, the tempered sigmoids, consistently outperform unbounded activation functions like ReLU. Using this paradigm, we achieve new state-of-the-art accuracy on MNIST, FashionMNIST, and CIFAR10 without a

preprint2020arXiv

DNNSurv: Deep Neural Networks for Survival Analysis Using Pseudo Values

There has been increasing interest in modelling survival data using deep learning methods in medical research. Current approaches have focused on designing special cost functions to handle censored survival data. We propose a very different method with two steps. In the first step, we transform each subject's survival time into a series of jackknife pseudo conditional survival probabilities and then use these pseudo probabilities as a quantitative response variable in the deep neural network model. By using the pseudo values, we reduce a complex survival analysis to a standard regression problem, which greatly simplifies the neural network construction. Our two-step approach is simple, yet very flexible in making risk predictions for survival data, which is very appealing from the practice point of view. The source code is freely available at http://github.com/lilizhaoUM/DNNSurv.

preprint2019arXiv

Prediction and optimization of mechanical properties of composites using convolutional neural networks

In this paper, we develop a convolutional neural network model to predict the mechanical properties of a two-dimensional checkerboard composite quantitatively. The checkerboard composite possesses two phases, one phase is soft and ductile while the other is stiff and brittle. The ground-truth data used in the training process are obtained from finite element analyses under the assumption of plane stress. Monte Carlo simulations and central limit theorem are used to find the size of the dataset needed. Once the training process is completed, the developed model is validated using data unseen during training. The developed neural network model captures the stiffness, strength, and toughness of checkerboard composites with high accuracy. Also, we integrate the developed model with a genetic algorithm (GA) optimizer to identify the optimal microstructural designs. The genetic algorithm optimizer adopted here has several operators, selection, crossover, mutation, and elitism. The optimizer converges to configurations with highly enhanced properties. For the case of the modulus and starting from randomly-initialized generation, the GA optimizer converges to the global maximum which invol

preprint2020arXiv

Robust Learning with Frequency Domain Regularization

Convolution neural networks have achieved remarkable performance in many tasks of computing vision. However, CNN tends to bias to low frequency components. They prioritize capturing low frequency patterns which lead them fail when suffering from application scenario transformation. While adversarial example implies the model is very sensitive to high frequency perturbations. In this paper, we introduce a new regularization method by constraining the frequency spectra of the filter of the model. Different from band-limit training, our method considers the valid frequency range probably entangles in different layers rather than continuous and trains the valid frequency range end-to-end by backpropagation. We demonstrate the effectiveness of our regularization by (1) defensing to adversarial perturbations; (2) reducing the generalization gap in different architecture; (3) improving the generalization ability in transfer learning scenario without fine-tune.

preprint2022arXiv

Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle parameter-dependent nuisance function estimation. We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform. In particular, we find that our proposed estimator outperforms classical OPE estimators for the mean in settings with heavy-tailed reward distributions.

preprint2021arXiv

LocalDrop: A Hybrid Regularization for Deep Neural Networks

In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a two-stage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final performances.