Research connected to "machine learning"

Search papers, authors, topics, institutions and opportunities, then move straight into the graph around the result.

FiltersOptional

Search results

Showing works 1,665-1,696 from 49,008 works in Machine Learning. Use pages to browse more, or open the graph for the map.

49,008matching works
Full topic scaleMachine Learning

49,008 works and 109,744 authors are indexed for this topic. This page shows 32 works at a time so search stays fast.

Match modeExact match focus
Semantic hits0
Active filters0
Graph viewOpen

Papers

preprint2013arXiv

KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles

We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework for topical keyphrase generation and ranking. By shifting from the unigram-centric traditional methods of unsupervised keyphrase extraction to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. We construct a topical keyphrase ranking function which implements the four criteria that represent high quality topical keyphrases (coverage, purity, phraseness, and completeness). The effectiveness of our approach is demonstrated on two collections of content-representative titles in the domains of Computer Science and Physics.

preprint2020arXiv

Survival Cluster Analysis

Conventional survival analysis approaches estimate risk scores or individualized time-to-event distributions conditioned on covariates. In practice, there is often great population-level phenotypic heterogeneity, resulting from (unknown) subpopulations with diverse risk profiles or survival distributions. As a result, there is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles, while jointly accounting for accurate individualized time-to-event predictions. An approach that addresses this need is likely to improve characterization of individual outcomes by leveraging regularities in subpopulations, thus accounting for population-level heterogeneity. In this paper, we propose a Bayesian nonparametrics approach that represents observations (subjects) in a clustered latent space, and encourages accurate time-to-event predictions and clusters (subpopulations) with distinct risk profiles. Experiments on real-world datasets show consistent improvements in predictive performance and interpretability relative to existing state-of-the-art survival analysis models.

preprint2013arXiv

Multi-class Generalized Binary Search for Active Inverse Reinforcement Learning

This paper addresses the problem of learning a task from demonstration. We adopt the framework of inverse reinforcement learning, where tasks are represented in the form of a reward function. Our contribution is a novel active learning algorithm that enables the learning agent to query the expert for more informative demonstrations, thus leading to more sample-efficient learning. For this novel algorithm (Generalized Binary Search for Inverse Reinforcement Learning, or GBS-IRL), we provide a theoretical bound on sample complexity and illustrate its applicability on several different tasks. To our knowledge, GBS-IRL is the first active IRL algorithm with provable sample complexity bounds. We also discuss our method in light of other existing methods in the literature and its general applicability in multi-class classification problems. Finally, motivated by recent work on learning from demonstration in robots, we also discuss how different forms of human feedback can be integrated in a transparent manner in our learning framework.

preprint2011arXiv

Sparse Bayesian Methods for Low-Rank Matrix Estimation

Recovery of low-rank matrices has recently seen significant activity in many areas of science and engineering, motivated by recent theoretical results for exact reconstruction guarantees and interesting practical applications. A number of methods have been developed for this recovery problem. However, a principled method for choosing the unknown target rank is generally not provided. In this paper, we present novel recovery algorithms for estimating low-rank matrices in matrix completion and robust principal component analysis based on sparse Bayesian learning (SBL) principles. Starting from a matrix factorization formulation and enforcing the low-rank constraint in the estimates as a sparsity constraint, we develop an approach that is very effective in determining the correct rank while providing high recovery performance. We provide connections with existing methods in other similar problems and empirical results and comparisons with current state-of-the-art methods that illustrate the effectiveness of this approach.

preprint2022arXiv

Learning Representations from Dendrograms

We propose unsupervised representation learning and feature extraction from dendrograms. The commonly used Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures and representations can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies.

preprint2026arXiv

A Closed-loop, State-centric, Multi-agent Framework for Passenger Load Estimation from Heterogeneous Data Streams

To support operations and passenger-facing services, transit agencies need reliable passenger load trajectories. Currently, load estimates are typically inferred from imperfect sensing systems rather than fully observed, and the accuracy of modern automatic passenger counting (APC) systems still varies with station layout, flow intensity, and operating conditions. To address the challenges of robust passenger load estimation from heterogeneous data streams, including incremental count errors, evidence conflicts, and context-dependent sensor reliability, we propose a closed-loop, state-centric, multi-agent framework. This method enforces physical feasibility at every step, allocates trust dynamically among evidence sources, and feeds physics-derived violation residuals back into training for robustness improvement. The architecture consists of a unified stop-event backbone, a coupled Perception--Physical--Fusion loop for stop-by-stop inference, and optional trip-level macro-correction and closed-loop calibration modules.

preprint2015arXiv

AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Deep convolutional neural networks comprise a subclass of deep neural networks (DNN) with a constrained architecture that leverages the spatial and temporal structure of the domain they model. Convolutional networks achieve the best predictive performance in areas such as speech and image recognition by hierarchically composing simple local features into complex models. Although DNNs have been used in drug discovery for QSAR and ligand-based bioactivity predictions, none of these models have benefited from this powerful convolutional architecture. This paper introduces AtomNet, the first structure-based, deep convolutional neural network designed to predict the bioactivity of small molecules for drug discovery applications. We demonstrate how to apply the convolutional concepts of feature locality and hierarchical composition to the modeling of bioactivity and chemical interactions. In further contrast to existing DNN techniques, we show that AtomNet's application of local convolutional filters to structural target information successfully predicts new active molecules for targets with no previously known modulators. Finally, we show that AtomNet outperforms previous docking ap

preprint2022arXiv

Algorithms for Bayesian network modeling and reliability inference of complex multistate systems: Part II-Dependent systems

In using the Bayesian network (BN) to construct the complex multistate system's reliability model as described in Part I, the memory storage requirements of the node probability table (NPT) will exceed the random access memory (RAM) of the computer. However, the proposed inference algorithm of Part I is not suitable for the dependent system. This Part II proposes a novel method for BN reliability modeling and analysis to apply the compression idea to the complex multistate dependent system. In this Part II, the dependent nodes and their parent nodes are equivalent to a block, based on which the multistate joint probability inference algorithm is proposed to calculate the joint probability distribution of a block's all nodes. Then, based on the proposed multistate compression algorithm of Part I, the dependent multistate inference algorithm is proposed for the complex multistate dependent system. The use and accuracy of the proposed algorithms are demonstrated in case 1. Finally, the proposed algorithms are applied to the reliability modeling and analysis of the satellite attitude control system. The results show that both Part I and Part II's proposed algorithms make the reliability modeling and analysis of the complex multistate system feasible.

preprint2022arXiv

Federated Transfer Learning with Multimodal Data

Smart cars, smartphones and other devices in the Internet of Things (IoT), which usually have more than one sensors, produce multimodal data. Federated Learning supports collecting a wealth of multimodal data from different devices without sharing raw data. Transfer Learning methods help transfer knowledge from some devices to others. Federated Transfer Learning methods benefit both Federated Learning and Transfer Learning. This newly proposed Federated Transfer Learning framework aims at connecting data islands with privacy protection. Our construction is based on Federated Learning and Transfer Learning. Compared with previous Federated Transfer Learnings, where each user should have data with identical modalities (either all unimodal or all multimodal), our new framework is more generic, it allows a hybrid distribution of user data. The core strategy is to use two different but inherently connected training methods for our two types of users. Supervised Learning is adopted for users with only unimodal data (Type 1), while Self-Supervised Learning is applied to user with multimodal data (Type 2) for both the feature of each modality and the connection between them. This connection knowledge of Type 2 will help Type 1 in later stages of training. Training in the new framework can be divided in three steps. In the first step, users who have data with the identical modalities are grouped together. For example, user with only sound signals are in group one, and those with only images are in group two, and users with multimodal data are in group three, and so on. In the second step, Federated Learning is executed within the groups, where Supervised Learning and Self-Supervised Learning are used depending on the group's nature. Most of the Transfer Learning happens in the third step, where the related parts in the network obtained from the previous steps are aggregated (federated).

preprint2013arXiv

A Correlation Clustering Approach to Link Classification in Signed Networks -- Full Version --

Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new family of efficient link classifiers based on covering the input graph with small circuits. These are the first active algorithms for link classification with mistake bounds that hold for arbitrary signed networks.

preprint2022arXiv

Learnability Lock: Authorized Learnability Control Through Adversarial Invertible Transformations

Owing much to the revolution of information technology, the recent progress of deep learning benefits incredibly from the vastly enhanced access to data available in various digital formats. However, in certain scenarios, people may not want their data being used for training commercial models and thus studied how to attack the learnability of deep learning models. Previous works on learnability attack only consider the goal of preventing unauthorized exploitation on the specific dataset but not the process of restoring the learnability for authorized cases. To tackle this issue, this paper introduces and investigates a new concept called "learnability lock" for controlling the model's learnability on a specific dataset with a special key. In particular, we propose adversarial invertible transformation, that can be viewed as a mapping from image to image, to slightly modify data samples so that they become "unlearnable" by machine learning models with negligible loss of visual features. Meanwhile, one can unlock the learnability of the dataset and train models normally using the corresponding key. The proposed learnability lock leverages class-wise perturbation that applies a universal transformation function on data samples of the same label. This ensures that the learnability can be easily restored with a simple inverse transformation while remaining difficult to be detected or reverse-engineered. We empirically demonstrate the success and practicability of our method on visual classification tasks.

preprint2011arXiv

Optimization with Sparsity-Inducing Penalties

Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view.

preprint2021arXiv

Ultra Efficient Transfer Learning with Meta Update for Cross Subject EEG Classification

The pattern of Electroencephalogram (EEG) signal differs significantly across different subjects, and poses challenge for EEG classifiers in terms of 1) effectively adapting a learned classifier onto a new subject, 2) retaining knowledge of known subjects after the adaptation. We propose an efficient transfer learning method, named Meta UPdate Strategy (MUPS-EEG), for continuous EEG classification across different subjects. The model learns effective representations with meta update which accelerates adaptation on new subject and mitigate forgetting of knowledge on previous subjects at the same time. The proposed mechanism originates from meta learning and works to 1) find feature representation that is broadly suitable for different subjects, 2) maximizes sensitivity of loss function for fast adaptation on new subject. The method can be applied to all deep learning oriented models. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed model, outperforming current state of the arts by a large margin in terms of both adapting on new subject and retain knowledge of learned subjects.

preprint2020arXiv

Learning with Multiplicative Perturbations

Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the regularization techniques that train Deep Neural Networks (DNNs) with adversarial examples generated by adding small but worst-case perturbations to input examples. In this paper, we propose xAT and xVAT, new adversarial training algorithms, that generate \textbf{multiplicative} perturbations to input examples for robust training of DNNs. Such perturbations are much more perceptible and interpretable than their \textbf{additive} counterparts exploited by AT and VAT. Furthermore, the multiplicative perturbations can be generated transductively or inductively while the standard AT and VAT only support a transductive implementation. We conduct a series of experiments that analyze the behavior of the multiplicative perturbations and demonstrate that xAT and xVAT match or outperform state-of-the-art classification accuracies across multiple established benchmarks while being about 30\% faster than their additive counterparts. Furthermore, the resulting DNNs also demonstrate distinct weight distributions.

preprint2026arXiv

Online Nonstochastic Prediction: Logarithmic Regret via Predictive Online Least Squares

We study online prediction for marginally stable, partially observed linear dynamical systems under nonstochastic disturbances. Our objective is to minimize the cumulative squared prediction loss and compete with the best-in-hindsight Luenberger predictor. Standard online learning methods typically rely on bounded domains/gradients, and thus their guarantees may fail to deal with potentially unbounded trajectories in marginally stable systems. In this paper, we introduce an unconstrained online least squares method that stabilizes the learning process via tailored predictive hints. With model knowledge, we prove that hints constructed from any stabilizing Luenberger predictor render the hint residuals uniformly bounded, achieving logarithmic regret despite unbounded trajectory growth. We also discuss model-free prediction and introduce a simple universal hint for symmetric systems, under which logarithmic regret is maintained without model knowledge. Our results provide an adaptive, instance-wise optimal online predictor compared to classical fixed-gain observers under nonstochastic disturbances.

preprint2012arXiv

Approximate Principal Direction Trees

We introduce a new spatial data structure for high dimensional data called the \emph{approximate principal direction tree} (APD tree) that adapts to the intrinsic dimension of the data. Our algorithm ensures vector-quantization accuracy similar to that of computationally-expensive PCA trees with similar time-complexity to that of lower-accuracy RP trees. APD trees use a small number of power-method iterations to find splitting planes for recursively partitioning the data. As such they provide a natural trade-off between the running-time and accuracy achieved by RP and PCA trees. Our theoretical results establish a) strong performance guarantees regardless of the convergence rate of the power-method and b) that $O(\log d)$ iterations suffice to establish the guarantee of PCA trees when the intrinsic dimension is $d$. We demonstrate this trade-off and the efficacy of our data structure on both the CPU and GPU.

preprint2020arXiv

Machine Unlearning: Linear Filtration for Logit-based Classifiers

Recently enacted legislation grants individuals certain rights to decide in what fashion their personal data may be used, and in particular a "right to be forgotten". This poses a challenge to machine learning: how to proceed when an individual retracts permission to use data which has been part of the training process of a model? From this question emerges the field of machine unlearning, which could be broadly described as the investigation of how to "delete training data from models". Our work complements this direction of research for the specific setting of class-wide deletion requests for classification models (e.g. deep neural networks). As a first step, we propose linear filtration as a intuitive, computationally efficient sanitization method. Our experiments demonstrate benefits in an adversarial setting over naive deletion schemes.

preprint2022arXiv

A Machine Learning Approach for Driver Identification Based on CAN-BUS Sensor Data

Driver identification is a momentous field of modern decorated vehicles in the controller area network (CAN-BUS) perspective. Many conventional systems are used to identify the driver. One step ahead, most of the researchers use sensor data of CAN-BUS but there are some difficulties because of the variation of the protocol of different models of vehicle. Our aim is to identify the driver through supervised learning algorithms based on driving behavior analysis. To determine the driver, a driver verification technique is proposed that evaluate driving pattern using the measurement of CAN sensor data. In this paper on-board diagnostic (OBD-II) is used to capture the data from the CAN-BUS sensor and the sensors are listed under SAE J1979 statement. According to the service of OBD-II, drive identification is possible. However, we have gained two types of accuracy on a complete data set with 10 drivers and a partial data set with two drivers. The accuracy is good with less number of drivers compared to the higher number of drivers. We have achieved statistically significant results in terms of accuracy in contrast to the baseline algorithm

preprint2021arXiv

Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels

Current neural architecture search (NAS) strategies focus only on finding a single, good, architecture. They offer little insight into why a specific network is performing well, or how we should modify the architecture if we want further improvements. We propose a Bayesian optimisation (BO) approach for NAS that combines the Weisfeiler-Lehman graph kernel with a Gaussian process surrogate. Our method optimises the architecture in a highly data-efficient manner: it is capable of capturing the topological structures of the architectures and is scalable to large graphs, thus making the high-dimensional and graph-like search spaces amenable to BO. More importantly, our method affords interpretability by discovering useful network features and their corresponding impact on the network performance. Indeed, we demonstrate empirically that our surrogate model is capable of identifying useful motifs which can guide the generation of new architectures. We finally show that our method outperforms existing NAS approaches to achieve the state of the art on both closed- and open-domain search spaces.

preprint2012arXiv

Quilting Stochastic Kronecker Product Graphs to Generate Multiplicative Attribute Graphs

We describe the first sub-quadratic sampling algorithm for the Multiplicative Attribute Graph Model (MAGM) of Kim and Leskovec (2010). We exploit the close connection between MAGM and the Kronecker Product Graph Model (KPGM) of Leskovec et al. (2010), and show that to sample a graph from a MAGM it suffices to sample small number of KPGM graphs and \emph{quilt} them together. Under a restricted set of technical conditions our algorithm runs in $O((\log_2(n))^3 |E|)$ time, where $n$ is the number of nodes and $|E|$ is the number of edges in the sampled graph. We demonstrate the scalability of our algorithm via extensive empirical evaluation; we can sample a MAGM graph with 8 million nodes and 20 billion edges in under 6 hours.

preprint2020arXiv

Rehearsal-Free Continual Learning over Small Non-I.I.D. Batches

Robotic vision is a field where continual learning can play a significant role. An embodied agent operating in a complex environment subject to frequent and unpredictable changes is required to learn and adapt continuously. In the context of object recognition, for example, a robot should be able to learn (without forgetting) objects of never before seen classes as well as improving its recognition capabilities as new instances of already known classes are discovered. Ideally, continual learning should be triggered by the availability of short videos of single objects and performed on-line on on-board hardware with fine-grained updates. In this paper, we introduce a novel continual learning protocol based on the CORe50 benchmark and propose two rehearsal-free continual learning techniques, CWR* and AR1*, that can learn effectively even in the challenging case of nearly 400 small non-i.i.d. incremental batches. In particular, our experiments show that AR1* can outperform other state-of-the-art rehearsal-free techniques by more than 15% accuracy in some cases, with a very light and constant computational and memory overhead across training batches.

preprint2026arXiv

How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System

Large Vision-Language Models (LVLMs) augmented with Retrieval-Augmented Generation (RAG) are increasingly employed in medical AI to enhance factual grounding through external clinical image-text retrieval. However, this reliance creates a significant attack surface. We propose MedThreatRAG, a novel multimodal poisoning framework that systematically probes vulnerabilities in medical RAG systems by injecting adversarial image-text pairs. A key innovation of our approach is the construction of a simulated semi-open attack environment, mimicking real-world medical systems that permit periodic knowledge base updates via user or pipeline contributions. Within this setting, we introduce and emphasize Cross-Modal Conflict Injection (CMCI), which embeds subtle semantic contradictions between medical images and their paired reports. These mismatches degrade retrieval and generation by disrupting cross-modal alignment while remaining sufficiently plausible to evade conventional filters. While basic textual and visual attacks are included for completeness, CMCI demonstrates the most severe degradation. Evaluations on IU-Xray and MIMIC-CXR QA tasks show that MedThreatRAG reduces answer F1 score

preprint2016arXiv

A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival

Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients' survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We

preprint2015arXiv

AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients. We discuss, analyze, and experiment with a setup motivated by the behavior of real-world distributed computation networks, where the machines are differently slow at different time. Therefore, we allow the parameter updates to be sensitive to the actual delays experienced, rather than to worst-case bounds on the maximum delay. This sensitivity leads to larger stepsizes, that can help gain rapid initial convergence without having to wait too long for slower machines, while maintaining the same asymptotic complexity. We obtain encouraging improvements to overall convergence for distributed experiments on real datasets with up to billions of examples and features.

preprint2026arXiv

Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers

Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds for Transformers remove the explicit polynomial dependence on the hidden dimension, they typically impose fixed norm constraints specified a priori and can exhibit unfavorable exponential dependence on depth. In this paper, we derive spectrum-adaptive post hoc generalization bounds for multi-layer Transformers. Under layerwise spectral norm control, the bounds are expressed in terms of layerwise Schatten quantities of the query-key, value, and feedforward weight matrices. Since the Schatten indices need not be fixed a priori and can instead be selected after training, separately for each matrix type and layer, the bounds adaptively trade off spectral complexity against the dimension- and depth-dependent factors according to the learned singular-value profiles. Empirical comparisons of BERT-adapted proxies for the leading complexity factors suggest that the proxies induced by our bounds grow more slowly with depth and hidden dimension than the corresponding norm-based proxies. Overall, our results provide a complexity-based perspective on how the spectral structure of trained Transformers is reflected in generalization analyses.

preprint2021arXiv

InversionNet3D: Efficient and Scalable Learning for 3D Full Waveform Inversion

Seismic full-waveform inversion (FWI) techniques aim to find a high-resolution subsurface geophysical model provided with waveform data. Some recent effort in data-driven FWI has shown some encouraging results in obtaining 2D velocity maps. However, due to high computational complexity and large memory consumption, the reconstruction of 3D high-resolution velocity maps via deep networks is still a great challenge. In this paper, we present InversionNet3D, an efficient and scalable encoder-decoder network for 3D FWI. The proposed method employs group convolution in the encoder to establish an effective hierarchy for learning information from multiple sources while cutting down unnecessary parameters and operations at the same time. The introduction of invertible layers further reduces the memory consumption of intermediate features during training and thus enables the development of deeper networks with more layers and higher capacity as required by different application scenarios. Experiments on the 3D Kimberlina dataset demonstrate that InversionNet3D achieves state-of-the-art reconstruction performance with lower computational cost and lower memory footprint compared to the baseline.

preprint2017arXiv

PSF : Introduction to R Package for Pattern Sequence Based Forecasting Algorithm

This paper discusses about an R package that implements the Pattern Sequence based Forecasting (PSF) algorithm, which was developed for univariate time series forecasting. This algorithm has been successfully applied to many different fields. The PSF algorithm consists of two major parts: clustering and prediction. The clustering part includes selection of the optimum number of clusters. It labels time series data with reference to such clusters. The prediction part includes functions like optimum window size selection for specific patterns and prediction of future values with reference to past pattern sequences. The PSF package consists of various functions to implement the PSF algorithm. It also contains a function which automates all other functions to obtain optimized prediction results. The aim of this package is to promote the PSF algorithm and to ease its implementation with minimum efforts. This paper describes all the functions in the PSF package with their syntax. It also provides a simple example of usage. Finally, the usefulness of this package is discussed by comparing it to auto.arima and ets, well-known time series forecasting functions available on CRAN repository.

preprint2021arXiv

Topological Deep Learning

This work introduces the Topological CNN (TCNN), which encompasses several topologically defined convolutional methods. Manifolds with important relationships to the natural image space are used to parameterize image filters which are used as convolutional weights in a TCNN. These manifolds also parameterize slices in layers of a TCNN across which the weights are localized. We show evidence that TCNNs learn faster, on less data, with fewer learned parameters, and with greater generalizability and interpretability than conventional CNNs. We introduce and explore TCNN layers for both image and video data. We propose extensions to 3D images and 3D video.

preprint2022arXiv

Variational Hyper-Encoding Networks

We propose a framework called HyperVAE for encoding distributions of distributions. When a target distribution is modeled by a VAE, its neural network parameters θis drawn from a distribution p(θ) which is modeled by a hyper-level VAE. We propose a variational inference using Gaussian mixture models to implicitly encode the parameters θinto a low dimensional Gaussian distribution. Given a target distribution, we predict the posterior distribution of the latent code, then use a matrix-network decoder to generate a posterior distribution q(θ). HyperVAE can encode the parameters θin full in contrast to common hyper-networks practices, which generate only the scale and bias vectors as target-network parameters. Thus HyperVAE preserves much more information about the model for each task in the latent space. We discuss HyperVAE using the minimum description length (MDL) principle and show that it helps HyperVAE to generalize. We evaluate HyperVAE in density estimation tasks, outlier detection and discovery of novel design classes, demonstrating its efficacy.

preprint2022arXiv

World Value Functions: Knowledge Representation for Multitask Reinforcement Learning

An open problem in artificial intelligence is how to learn and represent knowledge that is sufficient for a general agent that needs to solve multiple tasks in a given world. In this work we propose world value functions (WVFs), which are a type of general value function with mastery of the world - they represent not only how to solve a given task, but also how to solve any other goal-reaching task. To achieve this, we equip the agent with an internal goal space defined as all the world states where it experiences a terminal transition - a task outcome. The agent can then modify task rewards to define its own reward function, which provably drives it to learn how to achieve all achievable internal goals, and the value of doing so in the current task. We demonstrate a number of benefits of WVFs. When the agent's internal goal space is the entire state space, we demonstrate that the transition function can be inferred from the learned WVF, which allows the agent to plan using learned value functions. Additionally, we show that for tasks in the same world, a pretrained agent that has learned any WVF can then infer the policy and value function for any new task directly from its rewards. Finally, an important property for long-lived agents is the ability to reuse existing knowledge to solve new tasks. Using WVFs as the knowledge representation for learned tasks, we show that an agent is able to solve their logical combination zero-shot, resulting in a combinatorially increasing number of skills throughout their lifetime.

preprint2022arXiv

Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting

Due to the practical importance of regular expressions (regexes, for short), there has been a lot of research to automatically generate regexes from positive and negative string examples. We tackle the problem of learning regexes faster from positive and negative strings by relying on a novel approach called `neural example splitting'. Our approach essentially split up each example string into multiple parts using a neural network trained to group similar substrings from positive strings. This helps to learn a regex faster and, thus, more accurately since we now learn from several short-length strings. We propose an effective regex synthesis framework called `SplitRegex' that synthesizes subregexes from `split' positive substrings and produces the final regex by concatenating the synthesized subregexes. For the negative sample, we exploit pre-generated subregexes during the subregex synthesis process and perform the matching against negative strings. Then the final regex becomes consistent with all negative strings. SplitRegex is a divided-and-conquer framework for learning target regexes; split (=divide) positive strings and infer partial regexes for multiple parts, which is much more accurate than the whole string inferring, and concatenate (=conquer) inferred regexes while satisfying negative strings. We empirically demonstrate that the proposed SplitRegex framework substantially improves the previous regex synthesis approaches over four benchmark datasets.

preprint2015arXiv

Feature Selection for Ridge Regression with Provable Guarantees

We introduce single-set spectral sparsification as a deterministic sampling based feature selection technique for regularized least squares classification, which is the classification analogue to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world datasets, namely a subset of TechTC-300 datasets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.