Research connected to "machine learning"

Search papers, authors, topics, institutions and opportunities, then move straight into the graph around the result.

FiltersOptional

Search results

Showing works 2,497-2,528 from 49,008 works in Machine Learning. Use pages to browse more, or open the graph for the map.

49,008matching works
Full topic scaleMachine Learning

49,008 works and 109,744 authors are indexed for this topic. This page shows 32 works at a time so search stays fast.

Match modeExact match focus
Semantic hits0
Active filters0
Graph viewOpen

Papers

preprint2020arXiv

Deep Learning-Based Gait Recognition Using Smartphones in the Wild

Compared to other biometrics, gait is difficult to conceal and has the advantage of being unobtrusive. Inertial sensors, such as accelerometers and gyroscopes, are often used to capture gait dynamics. These inertial sensors are commonly integrated into smartphones and are widely used by the average person, which makes gait data convenient and inexpensive to collect. In this paper, we study gait recognition using smartphones in the wild. In contrast to traditional methods, which often require a person to walk along a specified road and/or at a normal walking speed, the proposed method collects inertial gait data under unconstrained conditions without knowing when, where, and how the user walks. To obtain good person identification and authentication performance, deep-learning techniques are presented to learn and model the gait biometrics based on walking data. Specifically, a hybrid deep neural network is proposed for robust gait feature representation, where features in the space and time domains are successively abstracted by a convolutional neural network and a recurrent neural network. In the experiments, two datasets collected by smartphones for a total of 118 subjects are use

preprint2022arXiv

On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimension

Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant challenge due to the intractability of the associated normalizing integral. In these scenarios, Hamiltonian Monte Carlo (HMC) sampling offers a computationally tractable way to generate data for training RL algorithms. In this paper, we introduce a framework, called \textit{Hamiltonian $Q$-Learning}, that demonstrates, both theoretically and empirically, that $Q$ values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Furthermore, to exploit the underlying low-rank structure of the $Q$ function, Hamiltonian $Q$-Learning uses a matrix completion algorithm for reconstructing the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply $Q$-learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications.

preprint2017arXiv

An innovative solution for breast cancer textual big data analysis

The digitalization of stored information in hospitals now allows for the exploitation of medical data in text format, as electronic health records (EHRs), initially gathered for other purposes than epidemiology. Manual search and analysis operations on such data become tedious. In recent years, the use of natural language processing (NLP) tools was highlighted to automatize the extraction of information contained in EHRs, structure it and perform statistical analysis on this structured information. The main difficulties with the existing approaches is the requirement of synonyms or ontology dictionaries, that are mostly available in English only and do not include local or custom notations. In this work, a team composed of oncologists as domain experts and data scientists develop a custom NLP-based system to process and structure textual clinical reports of patients suffering from breast cancer. The tool relies on the combination of standard text mining techniques and an advanced synonym detection method. It allows for a global analysis by retrieval of indicators such as medical history, tumor characteristics, therapeutic responses, recurrences and prognosis. The versatility of the method allows to obtain easily new indicators, thus opening up the way for retrospective studies with a substantial reduction of the amount of manual work. With no need for biomedical annotators or pre-defined ontologies, this language-agnostic method reached an good extraction accuracy for several concepts of interest, according to a comparison with a manually structured file, without requiring any existing corpus with local or new notations.

preprint2014arXiv

Multiple Instance Learning with Bag Dissimilarities

Multiple instance learning (MIL) is concerned with learning from sets (bags) of objects (instances), where the individual instance labels are ambiguous. In this setting, supervised learning cannot be applied directly. Often, specialized MIL methods learn by making additional assumptions about the relationship of the bag labels and instance labels. Such assumptions may fit a particular dataset, but do not generalize to the whole range of MIL problems. Other MIL methods shift the focus of assumptions from the labels to the overall (dis)similarity of bags, and therefore learn from bags directly. We propose to represent each bag by a vector of its dissimilarities to other bags in the training set, and treat these dissimilarities as a feature representation. We show several alternatives to define a dissimilarity between bags and discuss which definitions are more suitable for particular MIL problems. The experimental results show that the proposed approach is computationally inexpensive, yet very competitive with state-of-the-art algorithms on a wide range of MIL datasets.

preprint2020arXiv

Short-Term Forecasting COVID-19 Cases In Turkey Using Long Short-Term Memory Network

COVID-19 has been one of the most severe diseases, causing a harsh pandemic all over the world, since December 2019. The aim of this study is to evaluate the value of Long Short-Term Memory (LSTM) Networks in forecasting the total number of COVID-19 cases in Turkey. The COVID-19 data for 30 days, between March 24 and April 23, 2020, are used to estimate the next fifteen days. The mean absolute error of the LSTM Network for 15 days estimation is 1,69$\pm$1.35%. Whereas, for the same data, the error of the Box-Jenkins method is 3.24$\pm$1.56%, Prophet method is 6.88$\pm$4.96% and Holt-Winters Additive method with Damped Trend is 0.47$\pm$0.28%. Additionally, when the number of deaths data is also provided with the number of total cases to the input of LSTM Network, the mean error reduces to 0.99$\pm$0.51%. Consequently, addition of the number of deaths data to the input, results a lower error in forecasting, compared to using only the number of total cases as the input. However, Holt-Winters Additive method with Damped Trend gives superior results to LSTM Networks in forecasting the total number of COVID-19 cases.

preprint2022arXiv

Graphon-aided Joint Estimation of Multiple Graphs

We consider the problem of estimating the topology of multiple networks from nodal observations, where these networks are assumed to be drawn from the same (unknown) random graph model. We adopt a graphon as our random graph model, which is a nonparametric model from which graphs of potentially different sizes can be drawn. The versatility of graphons allows us to tackle the joint inference problem even for the cases where the graphs to be recovered contain different number of nodes and lack precise alignment across the graphs. Our solution is based on combining a maximum likelihood penalty with graphon estimation schemes and can be used to augment existing network inference methods. We validate our proposed approach by comparing its performance against competing methods in synthetic and real-world datasets.

preprint2015arXiv

Value function approximation via low-rank models

We propose a novel value function approximation technique for Markov decision processes. We consider the problem of compactly representing the state-action value function using a low-rank and sparse matrix model. The problem is to decompose a matrix that encodes the true value function into low-rank and sparse components, and we achieve this using Robust Principal Component Analysis (PCA). Under minimal assumptions, this Robust PCA problem can be solved exactly via the Principal Component Pursuit convex optimization problem. We experiment the procedure on several examples and demonstrate that our method yields approximations essentially identical to the true function.

preprint2026arXiv

Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters

Despite significant advances in Reinforcement Learning (RL), model performance remains highly sensitive to algorithm and hyperparameter configurations, while generalization gaps across environments complicate real-world deployment. Although prior work has studied RL generalization, the relative contribution of specific configurations to the generalization gap has not been quantitatively decomposed and systematically leveraged for configuration selection. To address this limitation, we propose an explainable framework that evaluates RL performance across robotic environments using SHapley Additive exPlanations (SHAP) to quantify configuration impacts. We establish a theoretical foundation connecting Shapley values to generalizability, empirically analyze configuration impact patterns, and introduce SHAP-guided configuration selection to enhance generalization. Our results reveal distinct patterns across algorithms and hyperparameters, with consistent configuration impacts across diverse tasks and environments. By applying these insights to configuration selection, we achieve improved RL generalizability and provide actionable guidance for practitioners.

preprint2020arXiv

Gaussian Process Policy Optimization

We propose a novel actor-critic, model-free reinforcement learning algorithm which employs a Bayesian method of parameter space exploration to solve environments. A Gaussian process is used to learn the expected return of a policy given the policy's parameters. The system is trained by updating the parameters using gradient descent on a new surrogate loss function consisting of the Proximal Policy Optimization 'Clipped' loss function and a bonus term representing the expected improvement acquisition function given by the Gaussian process. This new method is shown to be comparable to and at times empirically outperform current algorithms on environments that simulate robotic locomotion using the MuJoCo physics engine.

preprint2013arXiv

A Lipschitz Exploration-Exploitation Scheme for Bayesian Optimization

The problem of optimizing unknown costly-to-evaluate functions has been studied for a long time in the context of Bayesian Optimization. Algorithms in this field aim to find the optimizer of the function by asking only a few function evaluations at locations carefully selected based on a posterior model. In this paper, we assume the unknown function is Lipschitz continuous. Leveraging the Lipschitz property, we propose an algorithm with a distinct exploration phase followed by an exploitation phase. The exploration phase aims to select samples that shrink the search space as much as possible. The exploitation phase then focuses on the reduced search space and selects samples closest to the optimizer. Considering the Expected Improvement (EI) as a baseline, we empirically show that the proposed algorithm significantly outperforms EI.

preprint2026arXiv

CoAction: Cross-task Correlation-aware Pareto Set Learning

Pareto set learning (PSL) is an emerging paradigm in multi-objective optimization that trains neural networks to map preference vectors to Pareto optimal solutions. However, existing PSL methods primarily focus on solving a single multi-objective optimization problem at a time. This limitation not only increases computational costs in multi-objective multitask optimization scenarios by requiring a separate model for each task, but also fails to exploit the inter-task correlations across tasks. To address this, we propose a Cross-tAsk correlation-aware Pareto Set Learning (CoAction) framework, which leverages task-aware transformer to handle multiple tasks simultaneously. Specifically, by assigning task-specific embedding vectors to individual tasks, the model effectively distinguishes between tasks while facilitating knowledge sharing among them. We utilize a Transformer encoder as the backbone architecture to leverage its self-attention mechanism for capturing complex task dependencies. The proposed approach is evaluated on comprehensive multitask test suites covering both benchmark problems and real-world applications, demonstrating effectiveness and competitive performance in Hypervolume, Range, and Sparsity.

preprint2025arXiv

Agentic AI for Intent-Based Industrial Automation

The recent development of Agentic AI systems, empowered by autonomous large language models (LLMs) agents with planning and tool-usage capabilities, enables new possibilities for the evolution of industrial automation and reduces the complexity introduced by Industry 4.0. This work proposes a conceptual framework that integrates Agentic AI with the intent-based paradigm, originally developed in network research, to simplify human-machine interaction (HMI) and better align automation systems with the human-centric, sustainable, and resilient principles of Industry 5.0. Based on the intent-based processing, the framework allows human operators to express high-level business or operational goals in natural language, which are decomposed into actionable components. These intents are broken into expectations, conditions, targets, context, and information that guide sub-agents equipped with specialized tools to execute domain-specific tasks. A proof of concept was implemented using the CMAPSS dataset and Google Agent Developer Kit (ADK), demonstrating the feasibility of intent decomposition, agent orchestration, and autonomous decision-making in predictive maintenance scenarios. The resu

preprint2022arXiv

Another Use of SMOTE for Interpretable Data Collaboration Analysis

Recently, data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions. DC analysis centralizes individually constructed dimensionality-reduced intermediate representations and realizes integrated analysis via collaboration representations without sharing the original data. To construct the collaboration representations, each institution generates and shares a shareable anchor dataset and centralizes its intermediate representation. Although, random anchor dataset functions well for DC analysis in general, using an anchor dataset whose distribution is close to that of the raw dataset is expected to improve the recognition performance, particularly for the interpretable DC analysis. Based on an extension of the synthetic minority over-sampling technique (SMOTE), this study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage. Numerical results demonstrate the efficiency of the proposed SMOTE-based method over the existing anchor data constructions for artificial and real-world datasets. Specifically, the proposed method achieves 9 percentage point and 38 percentage point performance improvements regarding accuracy and essential feature selection, respectively, over existing methods for an income dataset. The proposed method provides another use of SMOTE not for imbalanced data classifications but for a key technology of privacy-preserving integrated analysis.

preprint2020arXiv

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.29x and 1.34x, respectively.

preprint2016arXiv

Sparse Hierarchical Tucker Factorization and its Application to Healthcare

We propose a new tensor factorization method, called the Sparse Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors. Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker method, which aims to compute a tree-structured factorization of an input data set that may be readily interpreted by a domain expert. However, Sparse H-Tucker uses a nested sampling technique to overcome a key scalability problem in Hierarchical Tucker, which is the creation of an unwieldy intermediate dense core tensor; the result of our approach is a faster, more space-efficient, and more accurate method. We extensively test our method on a real healthcare dataset, which is collected from 30K patients and results in an 18th order sparse data tensor. Unlike competing methods, Sparse H-Tucker can analyze the full data set on a single multi-threaded machine. It can also do so more accurately and in less time than the state-of-the-art: on a 12th order subset of the input data, Sparse H-Tucker is 18x more accurate and 7.5x faster than a previously state-of-the-art method. Even for analyzing low order tensors (e.g., 4-order), our method requires close to an order of

preprint2022arXiv

No-regret learning for repeated non-cooperative games with lossy bandits

This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedback. Since it is difficult to give the explicit model of the utility functions in dynamic environments, the players' action can only be learned with bandit feedback. Moreover, because of unreliable communication channels or privacy protection, the bandit feedback may be lost or dropped at random. Therefore, we study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss. The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb). We first give the regret analysis for concave games with differentiable and Lipschitz utilities. Then we show that the action profile converges to a Nash equilibrium with probability 1 when the game is also strictly monotone. We further provide the mean square convergence rate $\mathcal{O}\left(k^{-2\min\{β, 1/6\}}\right)$ when the game is $β-$ strongly monotone. In addition, we extend the algorithm to the case when the loss probability of the bandit feedback is unknown, and prove its almost sure convergence to Nash equilibrium for strictly monotone games. Finally, we take the resource management in fog computing as an application example, and carry out numerical experiments to empirically demonstrate the algorithm performance.

preprint2013arXiv

Revealing Cluster Structure of Graph by Path Following Replicator Dynamic

In this paper, we propose a path following replicator dynamic, and investigate its potentials in uncovering the underlying cluster structure of a graph. The proposed dynamic is a generalization of the discrete replicator dynamic. The replicator dynamic has been successfully used to extract dense clusters of graphs; however, it is often sensitive to the degree distribution of a graph, and usually biased by vertices with large degrees, thus may fail to detect the densest cluster. To overcome this problem, we introduce a dynamic parameter, called path parameter, into the evolution process. The path parameter can be interpreted as the maximal possible probability of a current cluster containing a vertex, and it monotonically increases as evolution process proceeds. By limiting the maximal probability, the phenomenon of some vertices dominating the early stage of evolution process is suppressed, thus making evolution process more robust. To solve the optimization problem with a fixed path parameter, we propose an efficient fixed point algorithm. The time complexity of the path following replicator dynamic is only linear in the number of edges of a graph, thus it can analyze graphs with

preprint2021arXiv

Noise Optimization for Artificial Neural Networks

Adding noises to artificial neural network(ANN) has been shown to be able to improve robustness in previous work. In this work, we propose a new technique to compute the pathwise stochastic gradient estimate with respect to the standard deviation of the Gaussian noise added to each neuron of the ANN. By our proposed technique, the gradient estimate with respect to noise levels is a byproduct of the backpropagation algorithm for estimating gradient with respect to synaptic weights in ANN. Thus, the noise level for each neuron can be optimized simultaneously in the processing of training the synaptic weights at nearly no extra computational cost. In numerical experiments, our proposed method can achieve significant performance improvement on robustness of several popular ANN structures under both black box and white box attacks tested in various computer vision datasets.

preprint2015arXiv

Optimization via Low-rank Approximation for Community Detection in Networks

Community detection is one of the fundamental problems of network analysis, for which a number of methods have been proposed. Most model-based or criteria-based methods have to solve an optimization problem over a discrete set of labels to find communities, which is computationally infeasible. Some fast spectral algorithms have been proposed for specific methods or models, but only on a case-by-case basis. Here we propose a general approach for maximizing a function of a network adjacency matrix over discrete labels by projecting the set of labels onto a subspace approximating the leading eigenvectors of the expected adjacency matrix. This projection onto a low-dimensional space makes the feasible set of labels much smaller and the optimization problem much easier. We prove a general result about this method and show how to apply it to several previously proposed community detection criteria, establishing its consistency for label estimation in each case and demonstrating the fundamental connection between spectral properties of the network and various model-based approaches to community detection. Simulations and applications to real-world data are included to demonstrate our meth

preprint2012arXiv

Detecting Activations over Graphs using Spanning Tree Wavelet Bases

We consider the detection of activations over graphs under Gaussian noise, where signals are piece-wise constant over the graph. Despite the wide applicability of such a detection algorithm, there has been little success in the development of computationally feasible methods with proveable theoretical guarantees for general graph topologies. We cast this as a hypothesis testing problem, and first provide a universal necessary condition for asymptotic distinguishability of the null and alternative hypotheses. We then introduce the spanning tree wavelet basis over graphs, a localized basis that reflects the topology of the graph, and prove that for any spanning tree, this approach can distinguish null from alternative in a low signal-to-noise regime. Lastly, we improve on this result and show that using the uniform spanning tree in the basis construction yields a randomized test with stronger theoretical guarantees that in many cases matches our necessary conditions. Specifically, we obtain near-optimal performance in edge transitive graphs, $k$-nearest neighbor graphs, and $ε$-graphs.

preprint2020arXiv

Asymptotic errors for convex penalized linear regression beyond Gaussian matrices

We consider the problem of learning a coefficient vector $x_{0}$ in $R^{N}$ from noisy linear observations $y=Fx_{0}+w$ in $R^{M}$ in the high dimensional limit $M,N$ to infinity with $α=M/N$ fixed. We provide a rigorous derivation of an explicit formula -- first conjectured using heuristic methods from statistical physics -- for the asymptotic mean squared error obtained by penalized convex regression estimators such as the LASSO or the elastic net, for a class of very generic random matrices corresponding to rotationally invariant data matrices with arbitrary spectrum. The proof is based on a convergence analysis of an oracle version of vector approximate message-passing (oracle-VAMP) and on the properties of its state evolution equations. Our method leverages on and highlights the link between vector approximate message-passing, Douglas-Rachford splitting and proximal descent algorithms, extending previous results obtained with i.i.d. matrices for a large class of problems. We illustrate our results on some concrete examples and show that even though they are asymptotic, our predictions agree remarkably well with numerics even for very moderate sizes.

preprint2021arXiv

Hyperparameter Transfer Learning with Adaptive Complexity

Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models. In practice, one frequently has to solve similar hyperparameter tuning problems sequentially. For example, one might have to tune a type of neural network learned across a series of different classification problems. Recent work on multi-task BO exploits knowledge gained from previous tuning tasks to speed up a new tuning task. However, previous approaches do not account for the fact that BO is a sequential decision making procedure. Hence, there is in general a mismatch between the number of evaluations collected in the current tuning task compared to the number of evaluations accumulated in all previously completed tasks. In this work, we enable multi-task BO to compensate for this mismatch, such that the transfer learning procedure is able to handle different data regimes in a principled way. We propose a new multi-task BO method that learns a set of ordered, non-linear basis functions of increasing complexity via nested drop-out and automatic relevance determination. Experiments on a variety of hyperparameter tuning problems show that our method improves the sample ef

preprint2024arXiv

Improving sample efficiency of high dimensional Bayesian optimization with MCMC

Sequential optimization methods are often confronted with the curse of dimensionality in high-dimensional spaces. Current approaches under the Gaussian process framework are still burdened by the computational complexity of tracking Gaussian process posteriors and need to partition the optimization problem into small regions to ensure exploration or assume an underlying low-dimensional structure. With the idea of transiting the candidate points towards more promising positions, we propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior. We provide theoretical guarantees of its convergence in the Gaussian process Thompson sampling setting. We also show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks.

preprint2022arXiv

On the Parameterization and Initialization of Diagonal State Space Models

State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it introduces a custom representation and algorithm that can be difficult to implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize such diagonal state space models. While it follows from classical results that almost all SSMs have an equivalent diagonal form, we show that the initialization is critical for performance. We explain why DSS works mathematically, by showing that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension. We also systematically describe various design choices in parameterizing and computing diagonal SSMs, and perform a controlled empirical study ablating the effects of these choices. Our final model S4D is a simple diagonal version of S4 whose kernel computation requires just 2 lines of code and performs comparably to S4 in almost all settings, with state-of-the-art results for image, audio, and medical time-series domains, and averaging 85\% on the Long Range Arena benchmark.

preprint2026arXiv

Drift Flow Matching

Iterative generative models such as Flow Matching and Diffusion models have demonstrated strong test-time scaling behavior, where additional inference computation can improve generation quality. In contrast, Drift Models offer efficient one-step generation, but their direct generation paradigm limits such flexibility. In this work, we propose Drift Flow Matching (DFM), a framework that connects drifting generative modeling with flow-based iterative generation. DFM preserves the efficiency of direct transport maps while enabling generation to be refined through multiple inference steps when desired. This bridges the gap between one-step Drift Models and multi-step Flow Matching methods, and provides a novel generative paradigm that can adapt sampling computation to different quality--efficiency requirements. Extensive experiments across different tasks and datasets demonstrate the effectiveness and generality of the proposed framework.

preprint2022arXiv

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

Deep reinforcement learning (RL) algorithms suffer severe performance degradation when the interaction data is scarce, which limits their real-world application. Recently, visual representation learning has been shown to be effective and promising for boosting sample efficiency in RL. These methods usually rely on contrastive learning and data augmentation to train a transition model for state prediction, which is different from how the model is used in RL--performing value-based planning. Accordingly, the learned representation by these visual methods may be good for recognition but not optimal for estimating state value and solving the decision problem. To address this issue, we propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. More specifically, VCR trains a model to predict the future state (also referred to as the ''imagined state'') based on the current one and a sequence of actions. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values. Then a distance is computed and minimized to force the imagined state to produce a similar action value prediction as that by the real state. We develop two implementations of the above idea for the discrete and continuous action spaces respectively. We conduct experiments on Atari 100K and DeepMind Control Suite benchmarks to validate their effectiveness for improving sample efficiency. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.

preprint2026arXiv

A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration

While reinforcement learning (RL) promises to revolutionize the control of complex nonlinear robotic systems, a profound gap persists between the heuristic success of model-free off-policy deep RL and the underlying theory, which remains largely confined to tabular or linearizable settings. We identify the cause of this gap as an emergent isolation of three traditions: (i) measure-theoretic MDP foundations on general spaces limit their analysis to exact dynamic programming and ignore all error sources of a learning process; (ii) deterministic error propagation analysis addresses the approximation error via concentrability coefficients without a finite-sample analysis of the estimation error; and (iii) PAC generalization bounds characterize the estimation errors of simplified topologies. We bridge these traditions with a unified theoretical framework for fitted Q-iteration (FQI) on general measurable Borel spaces. Our main result provides a finite-sample, adaptive-data performance bound by chaining measure-theoretic probability with Bellman-operator contraction in Banach spaces. We prove that sequential Rademacher complexity controls Bellman-regression generalization under policy-dependent data collection. We further extend this analysis to provide the first cumulative, pathwise online regret guarantee for FQI in continuous spaces. These results lay the necessary foundations for the formal analysis of many modern deep RL algorithms.

preprint2020arXiv

Meta Learning in the Continuous Time Limit

In this paper, we establish the ordinary differential equation (ODE) that underlies the training dynamics of Model-Agnostic Meta-Learning (MAML). Our continuous-time limit view of the process eliminates the influence of the manually chosen step size of gradient descent and includes the existing gradient descent training algorithm as a special case that results from a specific discretization. We show that the MAML ODE enjoys a linear convergence rate to an approximate stationary point of the MAML loss function for strongly convex task losses, even when the corresponding MAML loss is non-convex. Moreover, through the analysis of the MAML ODE, we propose a new BI-MAML training algorithm that significantly reduces the computational burden associated with existing MAML training methods. To complement our theoretical findings, we perform empirical experiments to showcase the superiority of our proposed methods with respect to the existing work.

preprint2013arXiv

Flexible High-dimensional Classification Machines and Their Asymptotic Properties

Classification is an important topic in statistics and machine learning with great potential in many real applications. In this paper, we investigate two popular large margin classification methods, Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD), under two contexts: the high-dimensional, low-sample size data and the imbalanced data. A unified family of classification machines, the FLexible Assortment MachinE (FLAME) is proposed, within which DWD and SVM are special cases. The FLAME family helps to identify the similarities and differences between SVM and DWD. It is well known that many classifiers overfit the data in the high-dimensional setting; and others are sensitive to the imbalanced data, that is, the class with a larger sample size overly influences the classifier and pushes the decision boundary towards the minority class. SVM is resistant to the imbalanced data issue, but it overfits high-dimensional data sets by showing the undesired data-piling phenomena. The DWD method was proposed to improve SVM in the high-dimensional setting, but its decision boundary is sensitive to the imbalanced ratio of sample sizes. Our FLAME family helps to understand a

preprint2015arXiv

Interpolating Convex and Non-Convex Tensor Decompositions via the Subspace Norm

We consider the problem of recovering a low-rank tensor from its noisy observation. Previous work has shown a recovery guarantee with signal to noise ratio $O(n^{\lceil K/2 \rceil /2})$ for recovering a $K$th order rank one tensor of size $n\times \cdots \times n$ by recursive unfolding. In this paper, we first improve this bound to $O(n^{K/4})$ by a much simpler approach, but with a more careful analysis. Then we propose a new norm called the subspace norm, which is based on the Kronecker products of factors obtained by the proposed simple estimator. The imposed Kronecker structure allows us to show a nearly ideal $O(\sqrt{n}+\sqrt{H^{K-1}})$ bound, in which the parameter $H$ controls the blend from the non-convex estimator to mode-wise nuclear norm minimization. Furthermore, we empirically demonstrate that the subspace norm achieves the nearly ideal denoising performance even with $H=O(1)$.

preprint2020arXiv

Safe Reinforcement Learning through Meta-learned Instincts

An important goal in reinforcement learning is to create agents that can quickly adapt to new goals while avoiding situations that might cause damage to themselves or their environments. One way agents learn is through exploration mechanisms, which are needed to discover new policies. However, in deep reinforcement learning, exploration is normally done by injecting noise in the action space. While performing well in many domains, this setup has the inherent risk that the noisy actions performed by the agent lead to unsafe states in the environment. Here we introduce a novel approach called Meta-Learned Instinctual Networks (MLIN) that allows agents to safely learn during their lifetime while avoiding potentially hazardous states. At the core of the approach is a plastic network trained through reinforcement learning and an evolved "instinctual" network, which does not change during the agent's lifetime but can modulate the noisy output of the plastic network. We test our idea on a simple 2D navigation task with no-go zones, in which the agent has to learn to approach new targets during deployment. MLIN outperforms standard meta-trained networks and allows agents to lea