Research connected to "machine learning"

Search papers, authors, topics, institutions and opportunities, then move straight into the graph around the result.

FiltersOptional

Search results

Showing works 705-736 from 49,008 works in Machine Learning. Use pages to browse more, or open the graph for the map.

49,008matching works
Full topic scaleMachine Learning

49,008 works and 109,744 authors are indexed for this topic. This page shows 32 works at a time so search stays fast.

Match modeExact match focus
Semantic hits0
Active filters0
Graph viewOpen

Papers

preprint2022arXiv

A Feedforward Unitary Equivariant Neural Network

We devise a new type of feedforward neural network. It is equivariant with respect to the unitary group $U(n)$. The input and output can be vectors in $\mathbb{C}^n$ with arbitrary dimension $n$. No convolution layer is required in our implementation. We avoid errors due to truncated higher order terms in Fourier-like transformation. The implementation of each layer can be done efficiently using simple calculations. As a proof of concept, we have given empirical results on the prediction of the dynamics of atomic motion to demonstrate the practicality of our approach.

preprint2015arXiv

Clamping Improves TRW and Mean Field Approximations

We examine the effect of clamping variables for approximate inference in undirected graphical models with pairwise relationships and discrete variables. For any number of variable labels, we demonstrate that clamping and summing approximate sub-partition functions can lead only to a decrease in the partition function estimate for TRW, and an increase for the naive mean field method, in each case guaranteeing an improvement in the approximation and bound. We next focus on binary variables, add the Bethe approximation to consideration and examine ways to choose good variables to clamp, introducing new methods. We show the importance of identifying highly frustrated cycles, and of checking the singleton entropy of a variable. We explore the value of our methods by empirical analysis and draw lessons to guide practitioners.

preprint2014arXiv

Generalization Bounds for Representative Domain Adaptation

In this paper, we propose a novel framework to analyze the theoretical properties of the learning process for a representative type of domain adaptation, which combines data from multiple sources and one target (or briefly called representative domain adaptation). In particular, we use the integral probability metric to measure the difference between the distributions of two domains and meanwhile compare it with the H-divergence and the discrepancy distance. We develop the Hoeffding-type, the Bennett-type and the McDiarmid-type deviation inequalities for multiple domains respectively, and then present the symmetrization inequality for representative domain adaptation. Next, we use the derived inequalities to obtain the Hoeffding-type and the Bennett-type generalization bounds respectively, both of which are based on the uniform entropy number. Moreover, we present the generalization bounds based on the Rademacher complexity. Finally, we analyze the asymptotic convergence and the rate of convergence of the learning process for representative domain adaptation. We discuss the factors that affect the asymptotic behavior of the learning process and the numerical experiments support our

preprint2016arXiv

A Hybrid Both Filter and Wrapper Feature Selection Method for Microarray Classification

Gene expression data is widely used in disease analysis and cancer diagnosis. However, since gene expression data could contain thousands of genes simultaneously, successful microarray classification is rather difficult. Feature selection is an important pre-treatment for any classification process. Selecting a useful gene subset as a classifier not only decreases the computational time and cost, but also increases classification accuracy. In this study, we applied the information gain method as a filter approach, and an improved binary particle swarm optimization as a wrapper approach to implement feature selection; selected gene subsets were used to evaluate the performance of classification. Experimental results show that by employing the proposed method fewer gene subsets needed to be selected and better classification accuracy could be obtained.

preprint2016arXiv

On the representation and embedding of knowledge bases beyond binary relations

The models developed to date for knowledge base embedding are all based on the assumption that the relations contained in knowledge bases are binary. For the training and testing of these embedding models, multi-fold (or n-ary) relational data are converted to triples (e.g., in FB15K dataset) and interpreted as instances of binary relations. This paper presents a canonical representation of knowledge bases containing multi-fold relations. We show that the existing embedding models on the popular FB15K datasets correspond to a sub-optimal modelling framework, resulting in a loss of structural information. We advocate a novel modelling framework, which models multi-fold relations directly using this canonical representation. Using this framework, the existing TransH model is generalized to a new model, m-TransH. We demonstrate experimentally that m-TransH outperforms TransH by a large margin, thereby establishing a new state of the art.

preprint2020arXiv

Dissecting Deep Neural Networks

In exchange for large quantities of data and processing power, deep neural networks have yielded models that provide state of the art predication capabilities in many fields. However, a lack of strong guarantees on their behaviour have raised concerns over their use in safety-critical applications. A first step to understanding these networks is to develop alternate representations that allow for further analysis. It has been shown that neural networks with piecewise affine activation functions are themselves piecewise affine, with their domains consisting of a vast number of linear regions. So far, the research on this topic has focused on counting the number of linear regions, rather than obtaining explicit piecewise affine representations. This work presents a novel algorithm that can compute the piecewise affine form of any fully connected neural network with rectified linear unit activations.

preprint2014arXiv

Projecting Markov Random Field Parameters for Fast Mixing

Markov chain Monte Carlo (MCMC) algorithms are simple and extremely powerful techniques to sample from almost arbitrary distributions. The flaw in practice is that it can take a large and/or unknown amount of time to converge to the stationary distribution. This paper gives sufficient conditions to guarantee that univariate Gibbs sampling on Markov Random Fields (MRFs) will be fast mixing, in a precise sense. Further, an algorithm is given to project onto this set of fast-mixing parameters in the Euclidean norm. Following recent work, we give an example use of this to project in various divergence measures, comparing univariate marginals obtained by sampling after projection to common variational methods and Gibbs sampling on the original parameters.

preprint2022arXiv

Local overlap reduction procedure for dynamic ensemble selection

Class imbalance is a characteristic known for making learning more challenging for classification models as they may end up biased towards the majority class. A promising approach among the ensemble-based methods in the context of imbalance learning is Dynamic Selection (DS). DS techniques single out a subset of the classifiers in the ensemble to label each given unknown sample according to their estimated competence in the area surrounding the query. Because only a small region is taken into account in the selection scheme, the global class disproportion may have less impact over the system's performance. However, the presence of local class overlap may severely hinder the DS techniques' performance over imbalanced distributions as it not only exacerbates the effects of the under-representation but also introduces ambiguous and possibly unreliable samples to the competence estimation process. Thus, in this work, we propose a DS technique which attempts to minimize the effects of the local class overlap during the classifier selection procedure. The proposed method iteratively removes from the target region the instance perceived as the hardest to classify until a classifier is deemed competent to label the query sample. The known samples are characterized using instance hardness measures that quantify the local class overlap. Experimental results show that the proposed technique can significantly outperform the baseline as well as several other DS techniques, suggesting its suitability for dealing with class under-representation and overlap. Furthermore, the proposed technique still yielded competitive results when using an under-sampled, less overlapped version of the labelled sets, specially over the problems with a high proportion of minority class samples in overlap areas. Code available at https://github.com/marianaasouza/lords.

preprint2016arXiv

Typical Stability

In this paper, we introduce a notion of algorithmic stability called typical stability. When our goal is to release real-valued queries (statistics) computed over a dataset, this notion does not require the queries to be of bounded sensitivity -- a condition that is generally assumed under differential privacy [DMNS06, Dwork06] when used as a notion of algorithmic stability [DFHPRR15a, DFHPRR15b, BNSSSU16] -- nor does it require the samples in the dataset to be independent -- a condition that is usually assumed when generalization-error guarantees are sought. Instead, typical stability requires the output of the query, when computed on a dataset drawn from the underlying distribution, to be concentrated around its expected value with respect to that distribution. We discuss the implications of typical stability on the generalization error (i.e., the difference between the value of the query computed on the dataset and the expected value of the query with respect to the true data distribution). We show that typical stability can control generalization error in adaptive data analysis even when the samples in the dataset are not necessarily independent and when queries to be computed

preprint2022arXiv

Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning

We present a new framework to measure the intrinsic properties of (deep) neural networks. While we focus on convolutional networks, our framework can be extrapolated to any network architecture. In particular, we evaluate two network properties, namely, capacity, which is related to expressivity, and compression, which is related to learnability. Both these properties depend only on the network structure and are independent of the network parameters. To this end, we propose two metrics: the first one, called layer complexity, captures the architectural complexity of any network layer; and, the second one, called layer intrinsic power, encodes how data is compressed along the network. The metrics are based on the concept of layer algebra, which is also introduced in this paper. This concept is based on the idea that the global properties depend on the network topology, and the leaf nodes of any neural network can be approximated using local transfer functions, thereby, allowing a simple computation of the global metrics. We show that our global complexity metric can be calculated and represented more conveniently than the widely-used VC dimension. We also compare the properties of various state-of-the art architectures using our metrics and use the properties to analyze their accuracy on benchmark image classification datasets.

preprint2020arXiv

Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy

Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably their most significant impact has been in the area of computer vision where great advances have been made in challenges such as plausible image generation, image-to-image translation, facial attribute manipulation and similar domains. Despite the significant successes achieved to date, applying GANs to real-world problems still poses significant challenges, three of which we focus on here. These are: (1) the generation of high quality images, (2) diversity of image generation, and (3) stable training. Focusing on the degree to which popular GAN technologies have made progress against these challenges, we provide a detailed review of the state of the art in GAN-related research in the published scientific literature. We further structure this review through a convenient taxonomy we have adopted based on variations in GAN architectures and loss functions. While several reviews for GANs have been presented to date, none have considered the status of this field based on their progress towards addressing practical challenges relevant to computer vision. Accordingly, we review and critically discuss the most popular architecture-variant, and loss-variant GANs, for tackling these challenges. Our objective is to provide an overview as well as a critical analysis of the status of GAN research in terms of relevant progress towards important computer vision application requirements. As we do this we also discuss the most compelling applications in computer vision in which GANs have demonstrated considerable success along with some suggestions for future research directions. Code related to GAN-variants studied in this work is summarized on https://github.com/sheqi/GAN_Review.

preprint2022arXiv

Towards Federated Clustering: A Federated Fuzzy $c$-Means Algorithm (FFCM)

Federated Learning (FL) is a setting where multiple parties with distributed data collaborate in training a joint Machine Learning (ML) model while keeping all data local at the parties. Federated clustering is an area of research within FL that is concerned with grouping together data that is globally similar while keeping all data local. We describe how this area of research can be of interest in itself, or how it helps addressing issues like non-independently-identically-distributed (i.i.d.) data in supervised FL frameworks. The focus of this work, however, is an extension of the federated fuzzy $c$-means algorithm to the FL setting (FFCM) as a contribution towards federated clustering. We propose two methods to calculate global cluster centers and evaluate their behaviour through challenging numerical experiments. We observe that one of the methods is able to identify good global clusters even in challenging scenarios, but also acknowledge that many challenges remain open.

preprint2022arXiv

Conditional GANs with Auxiliary Discriminative Classifier

Conditional generative models aim to learn the underlying joint distribution of data and labels to achieve conditional data generation. Among them, the auxiliary classifier generative adversarial network (AC-GAN) has been widely used, but suffers from the problem of low intra-class diversity of the generated samples. The fundamental reason pointed out in this paper is that the classifier of AC-GAN is generator-agnostic, which therefore cannot provide informative guidance for the generator to approach the joint distribution, resulting in a minimization of the conditional entropy that decreases the intra-class diversity. Motivated by this understanding, we propose a novel conditional GAN with an auxiliary discriminative classifier (ADC-GAN) to resolve the above problem. Specifically, the proposed auxiliary discriminative classifier becomes generator-aware by recognizing the class-labels of the real data and the generated data discriminatively. Our theoretical analysis reveals that the generator can faithfully learn the joint distribution even without the original discriminator, making the proposed ADC-GAN robust to the value of the coefficient hyperparameter and the selection of the GAN loss, and stable during training. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of ADC-GAN in conditional generative modeling compared to state-of-the-art classifier-based and projection-based conditional GANs.

preprint2026arXiv

RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation

Inpatient medication recommendation requires clinicians to repeatedly select specific medications, doses, and routes as a patient's condition evolves. Existing benchmarks formulate this task as admission-level prediction over coarse drug codes with multi-hot diagnostic and procedure code inputs, failing to capture the per-timepoint, information-rich nature of real prescribing. We propose RxEval, a prescription-level benchmark that evaluates LLM prescribing capability by multiple-choice questions: each question presents a detailed patient profile and time-ordered clinical trajectory, requiring selection of specific medication-dose-route triples from real prescriptions and patient-specific distractors generated via reasoning-chain perturbation. RxEval comprises 1,547 questions spanning 584 patients, 18 diagnostic categories, and 969 unique medications. Evaluation of 16 LLMs shows that RxEval is both challenging and discriminative: F1 ranges from 45.18 to 77.10 across models, and the best Exact Match is only 46.10%. Error analysis reveals that even frontier models may overlook stated patient information and fail to derive clinical conclusions.

preprint2020arXiv

A sparse code increases the speed and efficiency of neuro-dynamic programming for optimal control tasks with correlated inputs

Sparse codes in neuroscience have been suggested to offer certain computational advantages over other neural representations of sensory data. To explore this viewpoint, a sparse code is used to represent natural images in an optimal control task solved with neuro-dynamic programming, and its computational properties are investigated. The central finding is that when feature inputs to a linear network are correlated, an over-complete sparse code increases the memory capacity of the network in an efficient manner beyond that possible for any complete code with the same-sized input, and also increases the speed of learning the network weights. A complete sparse code is found to maximise the memory capacity of a linear network by decorrelating its feature inputs to transform the design matrix of the least-squares problem to one of full rank. It also conditions the Hessian matrix of the least-squares problem, thereby increasing the rate of convergence to the optimal network weights. Other types of decorrelating codes would also achieve this. However, an over-complete sparse code is found to be approximately decorrelated, extracting a larger number of approximately decorrelated features from the same-sized input, allowing it to efficiently increase memory capacity beyond that possible for any complete code: a 2.25 times over-complete sparse code is shown to at least double memory capacity compared with a complete sparse code using the same input. This is used in sequential learning to store a potentially large number of optimal control tasks in the network, while catastrophic forgetting is avoided using a partitioned representation, yielding a cost-to-go function approximator that generalizes over the states in each partition. Sparse code advantages over dense codes and local codes are also discussed.

preprint2022arXiv

Transformers as Meta-Learners for Implicit Neural Representations

Implicit Neural Representations (INRs) have emerged and shown their benefits over discrete representations in recent years. However, fitting an INR to the given observations usually requires optimization with gradient descent from scratch, which is inefficient and does not generalize well with sparse observations. To address this problem, most of the prior works train a hypernetwork that generates a single vector to modulate the INR weights, where the single vector becomes an information bottleneck that limits the reconstruction precision of the output INR. Recent work shows that the whole set of weights in INR can be precisely inferred without the single-vector bottleneck by gradient-based meta-learning. Motivated by a generalized formulation of gradient-based meta-learning, we propose a formulation that uses Transformers as hypernetworks for INRs, where it can directly build the whole set of INR weights with Transformers specialized as set-to-set mapping. We demonstrate the effectiveness of our method for building INRs in different tasks and domains, including 2D image regression and view synthesis for 3D objects. Our work draws connections between the Transformer hypernetworks and gradient-based meta-learning algorithms and we provide further analysis for understanding the generated INRs.

preprint2014arXiv

On Exact Learning Monotone DNF from Membership Queries

In this paper, we study the problem of learning a monotone DNF with at most $s$ terms of size (number of variables in each term) at most $r$ ($s$ term $r$-MDNF) from membership queries. This problem is equivalent to the problem of learning a general hypergraph using hyperedge-detecting queries, a problem motivated by applications arising in chemical reactions and genome sequencing. We first present new lower bounds for this problem and then present deterministic and randomized adaptive algorithms with query complexities that are almost optimal. All the algorithms we present in this paper run in time linear in the query complexity and the number of variables $n$. In addition, all of the algorithms we present in this paper are asymptotically tight for fixed $r$ and/or $s$.

preprint2022arXiv

Work In Progress: Safety and Robustness Verification of Autoencoder-Based Regression Models using the NNV Tool

This work in progress paper introduces robustness verification for autoencoder-based regression neural network (NN) models, following state-of-the-art approaches for robustness verification of image classification NNs. Despite the ongoing progress in developing verification methods for safety and robustness in various deep neural networks (DNNs), robustness checking of autoencoder models has not yet been considered. We explore this open space of research and check ways to bridge the gap between existing DNN verification methods by extending existing robustness analysis methods for such autoencoder networks. While classification models using autoencoders work more or less similar to image classification NNs, the functionality of regression models is distinctly different. We introduce two definitions of robustness evaluation metrics for autoencoder-based regression models, specifically the percentage robustness and un-robustness grade. We also modified the existing Imagestar approach, adjusting the variables to take care of the specific input types for regression networks. The approach is implemented as an extension of NNV, then applied and evaluated on a dataset, with a case study experiment shown using the same dataset. As per the authors' understanding, this work in progress paper is the first to show possible reachability analysis of autoencoder-based NNs.

preprint2022arXiv

Cancer Subtyping via Embedded Unsupervised Learning on Transcriptomics Data

Cancer is one of the deadliest diseases worldwide. Accurate diagnosis and classification of cancer subtypes are indispensable for effective clinical treatment. Promising results on automatic cancer subtyping systems have been published recently with the emergence of various deep learning methods. However, such automatic systems often overfit the data due to the high dimensionality and scarcity. In this paper, we propose to investigate automatic subtyping from an unsupervised learning perspective by directly constructing the underlying data distribution itself, hence sufficient data can be generated to alleviate the issue of overfitting. Specifically, we bypass the strong Gaussianity assumption that typically exists but fails in the unsupervised learning subtyping literature due to small-sized samples by vector quantization. Our proposed method better captures the latent space features and models the cancer subtype manifestation on a molecular basis, as demonstrated by the extensive experimental results.

preprint2020arXiv

Non-parametric Uni-modality Constraints for Deep Ordinal Classification

We propose a new constrained-optimization formulation for deep ordinal classification, in which uni-modality of the label distribution is enforced implicitly via a set of inequality constraints over all the pairs of adjacent labels. Based on (c-1) constraints for c labels, our model is non-parametric and, therefore, more flexible than the existing deep ordinal classification techniques. Unlike these, it does not restrict the learned representation to a single and specific parametric model (or penalty) imposed on all the labels. Therefore, it enables the training to explore larger spaces of solutions, while removing the need for ad hoc choices and scaling up to large numbers of labels. It can be used in conjunction with any standard classification loss and any deep architecture. To tackle the ensuing challenging optimization problem, we solve a sequence of unconstrained losses based on a powerful extension of the log-barrier method. This handles effectively competing constraints and accommodates standard SGD for deep networks, while avoiding computationally expensive Lagrangian dual steps and outperforming substantially penalty methods. Furthermore, we propose a new performance metr

preprint2012arXiv

Negative Tree Reweighted Belief Propagation

We introduce a new class of lower bounds on the log partition function of a Markov random field which makes use of a reversed Jensen's inequality. In particular, our method approximates the intractable distribution using a linear combination of spanning trees with negative weights. This technique is a lower-bound counterpart to the tree-reweighted belief propagation algorithm, which uses a convex combination of spanning trees with positive weights to provide corresponding upper bounds. We develop algorithms to optimize and tighten the lower bounds over the non-convex set of valid parameter values. Our algorithm generalizes mean field approaches (including naive and structured mean field approximations), which it includes as a limiting case.

preprint2020arXiv

Variational Optimization on Lie Groups, with Examples of Leading (Generalized) Eigenvalue Problems

The article considers smooth optimization of functions on Lie groups. By generalizing NAG variational principle in vector space (Wibisono et al., 2016) to Lie groups, continuous Lie-NAG dynamics which are guaranteed to converge to local optimum are obtained. They correspond to momentum versions of gradient flow on Lie groups. A particular case of $\mathsf{SO}(n)$ is then studied in details, with objective functions corresponding to leading Generalized EigenValue problems: the Lie-NAG dynamics are first made explicit in coordinates, and then discretized in structure preserving fashions, resulting in optimization algorithms with faithful energy behavior (due to conformal symplecticity) and exactly remaining on the Lie group. Stochastic gradient versions are also investigated. Numerical experiments on both synthetic data and practical problem (LDA for MNIST) demonstrate the effectiveness of the proposed methods as optimization algorithms ($not$ as a classification method).

preprint2026arXiv

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence ($Δ$), demonstrating that non-zero divergence is critical for effective scaling. Building on this insight, we propose a staged synthesis pipeline that repurposes existing datasets as seeds, actively targeting student failure modes to produce better prompts. The result is DeltaPrompts, a diverse dataset of 200k synthetic, high-divergence reasoning problems. We evaluate DeltaPrompts across three distinct settings: on-policy distillation with the target teacher-student pair, transfer to a novel model family without regenerating the data, and off-policy fine-tuning of a non-reasoning model. Across all scenarios, DeltaPrompts drives substantial gains, yielding up to 15% relative improvement even on top of a highly-optimized reasoning model (e.g., Qwen3-VL-8B-Thinking) -- averaged over 10 benchmarks spanning chart, document and perception-centric reasoning.

preprint2021arXiv

Joint Dimensionality Reduction for Separable Embedding Estimation

Low-dimensional embeddings for data from disparate sources play critical roles in multi-modal machine learning, multimedia information retrieval, and bioinformatics. In this paper, we propose a supervised dimensionality reduction method that learns linear embeddings jointly for two feature vectors representing data of different modalities or data from distinct types of entities. We also propose an efficient feature selection method that complements, and can be applied prior to, our joint dimensionality reduction method. Assuming that there exist true linear embeddings for these features, our analysis of the error in the learned linear embeddings provides theoretical guarantees that the dimensionality reduction method accurately estimates the true embeddings when certain technical conditions are satisfied and the number of samples is sufficiently large. The derived sample complexity results are echoed by numerical experiments. We apply the proposed dimensionality reduction method to gene-disease association, and predict unknown associations using kernel regression on the dimension-reduced feature vectors. Our approach compares favorably against other dimensionality reduction methods, and against a state-of-the-art method of bilinear regression for predicting gene-disease associations.

preprint2022arXiv

HyperInvariances: Amortizing Invariance Learning

Providing invariances in a given learning task conveys a key inductive bias that can lead to sample-efficient learning and good generalisation, if correctly specified. However, the ideal invariances for many problems of interest are often not known, which has led both to a body of engineering lore as well as attempts to provide frameworks for invariance learning. However, invariance learning is expensive and data intensive for popular neural architectures. We introduce the notion of amortizing invariance learning. In an up-front learning phase, we learn a low-dimensional manifold of feature extractors spanning invariance to different transformations using a hyper-network. Then, for any problem of interest, both model and invariance learning are rapid and efficient by fitting a low-dimensional invariance descriptor an output head. Empirically, this framework can identify appropriate invariances in different downstream tasks and lead to comparable or better test performance than conventional approaches. Our HyperInvariance framework is also theoretically appealing as it enables generalisation-bounds that provide an interesting new operating point in the trade-off between model fit and complexity.

preprint2026arXiv

SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs

Recent studies observe that reinforcement learning with verifiable rewards (RLVR) reliably improves pass@1 on reasoning tasks, yet often fails to yield comparable gains in pass@k, raising the question of whether RLVR genuinely enables large language models to acquire novel reasoning abilities or merely enhances the efficiency of sampling reasoning modes already present in the base model. Prior analyses largely support the latter view, attributing this limitation to structural properties of standard RLVR objectives that result in insufficient exploration pressure. In this work, we argue that a central structural constraint arises from reverse-KL regularization, which stabilizes training but inherently anchors the policy to the reference distribution, thereby suppressing the emergence of alternative reasoning modes. However, we show that neither removing the KL term nor replacing it with forward-KL provides a satisfactory solution, as both disrupt the efficiency-coverage trade-off by either inducing reward hacking or allocating probability mass to off-target regions. To resolve this tension, we propose SAGE, a principled framework that enables controllable empirical support expansion by reshaping the reverse-KL anchor distribution itself through a guide function q(x,y), achieving consistent improvements in both pass@1 and pass@k across challenging mathematical reasoning benchmarks. Our code is available at https://github.com/tally0818/SAGE.

preprint2022arXiv

Hardware Trojan Insertion Using Reinforcement Learning

This paper utilizes Reinforcement Learning (RL) as a means to automate the Hardware Trojan (HT) insertion process to eliminate the inherent human biases that limit the development of robust HT detection methods. An RL agent explores the design space and finds circuit locations that are best for keeping inserted HTs hidden. To achieve this, a digital circuit is converted to an environment in which an RL agent inserts HTs such that the cumulative reward is maximized. Our toolset can insert combinational HTs into the ISCAS-85 benchmark suite with variations in HT size and triggering conditions. Experimental results show that the toolset achieves high input coverage rates (100\% in two benchmark circuits) that confirms its effectiveness. Also, the inserted HTs have shown a minimal footprint and rare activation probability.

preprint2020arXiv

Seasonal Averaged One-Dependence Estimators: A Novel Algorithm to Address Seasonal Concept Drift in High-Dimensional Stream Classification

Stream classification methods classify a continuous stream of data as new labelled samples arrive. They often also have to deal with concept drift. This paper focuses on seasonal drift in stream classification, which can be found in many real-world application data sources. Traditional approaches of stream classification consider seasonal drift by including seasonal dummy/indicator variables or building separate models for each season. But these approaches have strong limitations in high-dimensional classification problems, or with complex seasonal patterns. This paper explores how to best handle seasonal drift in the specific context of news article categorization (or classification/tagging), where seasonal drift is overwhelmingly the main type of drift present in the data, and for which the data are high-dimensional. We introduce a novel classifier named Seasonal Averaged One-Dependence Estimators (SAODE), which extends the AODE classifier to handle seasonal drift by including time as a super parent. We assess our SAODE model using two large real-world text mining related datasets each comprising approximately a million records, against nine state-of-the-art stream and concept dr

preprint2020arXiv

Fully Neural Network based Model for General Temporal Point Processes

A temporal point process is a mathematical model for a time series of discrete events, which covers various applications. Recently, recurrent neural network (RNN) based models have been developed for point processes and have been found effective. RNN based models usually assume a specific functional form for the time course of the intensity function of a point process (e.g., exponentially decreasing or increasing with the time since the most recent event). However, such an assumption can restrict the expressive power of the model. We herein propose a novel RNN based model in which the time course of the intensity function is represented in a general manner. In our approach, we first model the integral of the intensity function using a feedforward neural network and then obtain the intensity function as its derivative. This approach enables us to both obtain a flexible model of the intensity function and exactly evaluate the log-likelihood function, which contains the integral of the intensity function, without any numerical approximations. Our model achieves competitive or superior performances compared to the previous state-of-the-art methods for both synthetic and real datasets.

preprint2016arXiv

Intra-day Activity Better Predicts Chronic Conditions

In this work we investigate intra-day patterns of activity on a population of 7,261 users of mobile health wearable devices and apps. We show that: (1) using intra-day step and sleep data recorded from passive trackers significantly improves classification performance on self-reported chronic conditions related to mental health and nervous system disorders, (2) Convolutional Neural Networks achieve top classification performance vs. baseline models when trained directly on multivariate time series of activity data, and (3) jointly predicting all condition classes via multi-task learning can be leveraged to extract features that generalize across data sets and achieve the highest classification performance.

preprint2026arXiv

Correcting Mode Proportion Bias in Generalized Bayesian Inference via a Weighted Kernel Stein Discrepancy

Generalized Bayesian Inference (GBI) provides a flexible framework for updating prior distributions using various loss functions instead of the traditional likelihoods, thereby enhancing the model robustness to model misspecification. However, GBI often suffers the problem associated with intractable likelihoods. Kernelized Stein Discrepancy (KSD), as utilized in a recent study, addresses this challenge by relying only on the gradient of the log-likelihood. Despite this innovation, KSD-Bayes suffers from critical pathologies, including insensitivity to well-separated modes in multimodal posteriors. To address this limitation, we propose a weighted KSD method that retains computational efficiency while effectively capturing multimodal structures. Our method improves the GBI framework for handling intractable multimodal posteriors while maintaining key theoretical properties such as posterior consistency and asymptotic normality. Experimental results demonstrate that our method substantially improves mode sensitivity compared to standard KSD-Bayes, while retaining robust performance in unimodal settings and in the presence of outliers.

preprint2021arXiv

Deep Co-Attention Network for Multi-View Subspace Learning

Many real-world applications involve data from multiple modalities and thus exhibit the view heterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users' posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner. To address these issues, in this paper, we propose a deep co-attention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the optimal encoders and discriminator, which are evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictions on an image data set.