Source author record

Jesse Davis

Jesse Davis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Information Retrieval q-fin.CP Computer Vision Cryptography and Security Databases eess.SP q-fin.ST

Catalog footprint

What is connected

17works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

How to Allocate your Label Budget? Choosing between Active Learning and Learning to Reject in Anomaly Detection

Anomaly detection attempts at finding examples that deviate from the expected behaviour. Usually, anomaly detection is tackled from an unsupervised perspective because anomalous labels are rare and difficult to acquire. However, the lack of labels makes the anomaly detector have high uncertainty in some regions, which usually results in poor predictive performance or low user trust in the predictions. One can reduce such uncertainty by collecting specific labels using Active Learning (AL), which targets examples close to the detector's decision boundary. Alternatively, one can increase the user trust by allowing the detector to abstain from making highly uncertain predictions, which is called Learning to Reject (LR). One way to do this is by thresholding the detector's uncertainty based on where its performance is low, which requires labels to be evaluated. Although both AL and LR need labels, they work with different types of labels: AL seeks strategic labels, which are evidently biased, while LR requires i.i.d. labels to evaluate the detector's performance and set the rejection threshold. Because one usually has a unique label budget, deciding how to optimally allocate it is challenging. In this paper, we propose a mixed strategy that, given a budget of labels, decides in multiple rounds whether to use the budget to collect AL labels or LR labels. The strategy is based on a reward function that measures the expected gain when allocating the budget to either side. We evaluate our strategy on 18 benchmark datasets and compare it to some baselines.

preprint2022arXiv

Adversarial Example Detection in Deployed Tree Ensembles

Tree ensembles are powerful models that are widely used. However, they are susceptible to adversarial examples, which are examples that purposely constructed to elicit a misprediction from the model. This can degrade performance and erode a user's trust in the model. Typically, approaches try to alleviate this problem by verifying how robust a learned ensemble is or robustifying the learning process. We take an alternative approach and attempt to detect adversarial examples in a post-deployment setting. We present a novel method for this task that works by analyzing an unseen example's output configuration, which is the set of predictions made by an ensemble's constituent trees. Our approach works with any additive tree ensemble and does not require training a separate model. We evaluate our approach on three different tree ensemble learners. We empirically show that our method is currently the best adversarial detection method for tree ensembles.

preprint2022arXiv

Elastic Product Quantization for Time Series

Analyzing numerous or long time series is difficult in practice due to the high storage costs and computational requirements. Therefore, techniques have been proposed to generate compact similarity-preserving representations of time series, enabling real-time similarity search on large in-memory data collections. However, the existing techniques are not ideally suited for assessing similarity when sequences are locally out of phase. In this paper, we propose the use of product quantization for efficient similarity-based comparison of time series under time warping. The idea is to first compress the data by partitioning the time series into equal length sub-sequences which are represented by a short code. The distance between two time series can then be efficiently approximated by pre-computed elastic distances between their codes. The partitioning into sub-sequences forces unwanted alignments, which we address with a pre-alignment step using the maximal overlap discrete wavelet transform (MODWT). To demonstrate the efficiency and accuracy of our method, we perform an extensive experimental evaluation on benchmark datasets in nearest neighbors classification and clustering applications. Overall, the proposed solution emerges as a highly efficient (both in terms of memory usage and computation time) replacement for elastic measures in time series applications.

preprint2022arXiv

Nowcasting Stock Implied Volatility with Twitter

In this study, we predict next-day movements of stock end-of-day implied volatility using random forests. Through an ablation study, we examine the usefulness of different sources of predictors and expose the value of attention and sentiment features extracted from Twitter. We study the approach on a stock universe comprised of the 165 most liquid US stocks diversified across the 11 traditional market sectors using a sizeable out-of-sample period spanning over six years. In doing so, we uncover that stocks in certain sectors, such as Consumer Discretionary, Technology, Real Estate, and Utilities are easier to predict than others. Further analysis shows that possible reasons for these discrepancies might be caused by either excess social media attention or low option liquidity. Lastly, we explore how our proposed approach fares throughout time by identifying four underlying market regimes in implied volatility using hidden Markov models. We find that most added value is achieved in regimes associated with lower implied volatility, but optimal regimes vary per market sector.

preprint2020arXiv

A general anomaly detection framework for fleet-based condition monitoring of machines

Machine failures decrease up-time and can lead to extra repair costs or even to human casualties and environmental pollution. Recent condition monitoring techniques use artificial intelligence in an effort to avoid time-consuming manual analysis and handcrafted feature extraction. Many of these only analyze a single machine and require a large historical data set. In practice, this can be difficult and expensive to collect. However, some industrial condition monitoring applications involve a fleet of similar operating machines. In most of these applications, it is safe to assume healthy conditions for the majority of machines. Deviating machine behavior is then an indicator for a machine fault. This work proposes an unsupervised, generic, anomaly detection framework for fleet-based condition monitoring. It uses generic building blocks and offers three key advantages. First, a historical data set is not required due to online fleet-based comparisons. Second, it allows incorporating domain expertise by user-defined comparison measures. Finally, contrary to most black-box artificial intelligence techniques, easy interpretability allows a domain expert to validate the predictions made by the framework. Two use-cases on an electrical machine fleet demonstrate the applicability of the framework to detect a voltage unbalance by means of electrical and vibration signatures.

preprint2020arXiv

Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder

The goal of anomaly detection is to identify examples that deviate from normal or expected behavior. We tackle this problem for images. We consider a two-phase approach. First, using normal examples, a convolutional autoencoder (CAE) is trained to extract a low-dimensional representation of the images. Here, we propose a novel architectural choice when designing the CAE, an Inception-like CAE. It combines convolutional filters of different kernel sizes and it uses a Global Average Pooling (GAP) operation to extract the representations from the CAE's bottleneck layer. Second, we employ a distanced-based anomaly detector in the low-dimensional space of the learned representation for the images. However, instead of computing the exact distance, we compute an approximate distance using product quantization. This alleviates the high memory and prediction time costs of distance-based anomaly detectors. We compare our proposed approach to a number of baselines and state-of-the-art methods on four image datasets, and we find that our approach resulted in improved predictive performance.

preprint2020arXiv

Learning from positive and unlabeled data: a survey

Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.

preprint2020arXiv

Using Machine Learning and Alternative Data to Predict Movements in Market Risk

Using machine learning and alternative data for the prediction of financial markets has been a popular topic in recent years. Many financial variables such as stock price, historical volatility and trade volume have already been through extensive investigation. Remarkably, we found no existing research on the prediction of an asset's market implied volatility within this context. This forward-looking measure gauges the sentiment on the future volatility of an asset, and is deemed one of the most important parameters in the world of derivatives. The ability to predict this statistic may therefore provide a competitive edge to practitioners of market making and asset management alike. Consequently, in this paper we investigate Google News statistics and Wikipedia site traffic as alternative data sources to quantitative market data and consider Logistic Regression, Support Vector Machines and AdaBoost as machine learning models. We show that movements in market implied volatility can indeed be predicted through the help of machine learning techniques. Although the employed alternative data appears to not enhance predictive accuracy, we reveal preliminary evidence of non-linear relationships between features obtained from Wikipedia page traffic and movements in market implied volatility.

preprint2016arXiv

Learning Possibilistic Logic Theories from Default Rules

We introduce a setting for learning possibilistic logic theories from defaults of the form "if alpha then typically beta". We first analyse this problem from the point of view of machine learning theory, determining the VC dimension of possibilistic stratifications as well as the complexity of the associated learning problems, after which we present a heuristic learning algorithm that can easily scale to thousands of defaults. An important property of our approach is that it is inherently able to handle noisy and conflicting sets of defaults. Among others, this allows us to learn possibilistic logic theories from crowdsourced data and to approximate propositional Markov logic networks using heuristic MAP solvers. We present experimental results that demonstrate the effectiveness of this approach.

preprint2016arXiv

Stratified Knowledge Bases as Interpretable Probabilistic Models (Extended Abstract)

In this paper, we advocate the use of stratified logical theories for representing probabilistic models. We argue that such encodings can be more interpretable than those obtained in existing frameworks such as Markov logic networks. Among others, this allows for the use of domain experts to improve learned models by directly removing, adding, or modifying logical formulas.

preprint2015arXiv

Assessing binary classifiers using only positive and unlabeled data

Assessing the performance of a learned model is a crucial part of machine learning. However, in some domains only positive and unlabeled examples are available, which prohibits the use of most standard evaluation metrics. We propose an approach to estimate any metric based on contingency tables, including ROC and PR curves, using only positive and unlabeled data. Estimating these performance metrics is essentially reduced to estimating the fraction of (latent) positives in the unlabeled set, assuming known positives are a random sample of all positives. We provide theoretical bounds on the quality of our estimates, illustrate the importance of estimating the fraction of positives in the unlabeled set and demonstrate empirically that we are able to reliably estimate ROC and PR curves on real data.

preprint2015arXiv

Encoding Markov Logic Networks in Possibilistic Logic

Markov logic uses weighted formulas to compactly encode a probability distribution over possible worlds. Despite the use of logical formulas, Markov logic networks (MLNs) can be difficult to interpret, due to the often counter-intuitive meaning of their weights. To address this issue, we propose a method to construct a possibilistic logic theory that exactly captures what can be derived from a given MLN using maximum a posteriori (MAP) inference. Unfortunately, the size of this theory is exponential in general. We therefore also propose two methods which can derive compact theories that still capture MAP inference, but only for specific types of evidence. These theories can be used, among others, to make explicit the hidden assumptions underlying an MLN or to explain the predictions it makes.

preprint2014arXiv

Lifted Variable Elimination: Decoupling the Operators from the Constraint Language

Lifted probabilistic inference algorithms exploit regularities in the structure of graphical models to perform inference more efficiently. More specifically, they identify groups of interchangeable variables and perform inference once per group, as opposed to once per variable. The groups are defined by means of constraints, so the flexibility of the grouping is determined by the expressivity of the constraint language. Existing approaches for exact lifted inference use specific languages for (in)equality constraints, which often have limited expressivity. In this article, we decouple lifted inference from the constraint language. We define operators for lifted inference in terms of relational algebra operators, so that they operate on the semantic level (the constraints extension) rather than on the syntactic level, making them language-independent. As a result, lifted inference can be performed using more powerful constraint languages, which provide more opportunities for lifting. We empirically demonstrate that this can improve inference efficiency by orders of magnitude, allowing exact inference where until now only approximate inference was feasible.

preprint2013arXiv

First-Order Decomposition Trees

Lifting attempts to speed up probabilistic inference by exploiting symmetries in the model. Exact lifted inference methods, like their propositional counterparts, work by recursively decomposing the model and the problem. In the propositional case, there exist formal structures, such as decomposition trees (dtrees), that represent such a decomposition and allow us to determine the complexity of inference a priori. However, there is currently no equivalent structure nor analogous complexity results for lifted inference. In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the first-order level. We show how these trees can characterize a lifted inference solution for a probabilistic logical model (in terms of a sequence of lifted operations), and make a theoretical analysis of the complexity of lifted inference in terms of the novel notion of lifted width for the tree.

preprint2012arXiv

Demand-Driven Clustering in Relational Domains for Predicting Adverse Drug Events

Learning from electronic medical records (EMR) is challenging due to their relational nature and the uncertain dependence between a patient's past and future health status. Statistical relational learning is a natural fit for analyzing EMRs but is less adept at handling their inherent latent structure, such as connections between related medications or diseases. One way to capture the latent structure is via a relational clustering of objects. We propose a novel approach that, instead of pre-clustering the objects, performs a demand-driven clustering during learning. We evaluate our algorithm on three real-world tasks where the goal is to use EMRs to predict whether a patient will have an adverse reaction to a medication. We find that our approach is more accurate than performing no clustering, pre-clustering, and using expert-constructed medical heterarchies.

preprint2012arXiv

Lifted Variable Elimination: A Novel Operator and Completeness Results

Various methods for lifted probabilistic inference have been proposed, but our understanding of these methods and the relationships between them is still limited, compared to their propositional counterparts. The only existing theoretical characterization of lifting is for weighted first-order model counting (WFOMC), which was shown to be complete domain-lifted for the class of 2-logvar models. This paper makes two contributions to lifted variable elimination (LVE). First, we introduce a novel inference operator called group inversion. Second, we prove that LVE augmented with this operator is complete in the same sense as WFOMC.

preprint2012arXiv

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachievable, and the size of this region depends only on the skew. This paper precisely characterizes the size of that region and discusses its implications for empirical evaluation methodology in machine learning.

Jesse Davis

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

How to Allocate your Label Budget? Choosing between Active Learning and Learning to Reject in Anomaly Detection

Adversarial Example Detection in Deployed Tree Ensembles

Elastic Product Quantization for Time Series

Nowcasting Stock Implied Volatility with Twitter

A general anomaly detection framework for fleet-based condition monitoring of machines

Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder

Learning from positive and unlabeled data: a survey

Using Machine Learning and Alternative Data to Predict Movements in Market Risk

Learning Possibilistic Logic Theories from Default Rules

Stratified Knowledge Bases as Interpretable Probabilistic Models (Extended Abstract)

Assessing binary classifiers using only positive and unlabeled data

Encoding Markov Logic Networks in Possibilistic Logic

Lifted Variable Elimination: Decoupling the Operators from the Constraint Language

First-Order Decomposition Trees

Demand-Driven Clustering in Relational Domains for Predicting Adverse Drug Events

Lifted Variable Elimination: A Novel Operator and Completeness Results

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation