Source author record

David Sutton

David Sutton appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning astro-ph astro-ph.IM Cryptography and Security

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems

Global financial crime activity is driving demand for machine learning solutions in fraud prevention. However, prevention systems are commonly serviced to financial institutions in isolation, and few provisions exist for data sharing due to fears of unintentional leaks and adversarial attacks. Collaborative learning advances in finance are rare, and it is hard to find real-world insights derived from privacy-preserving data processing systems. In this paper, we present a collaborative deep learning framework for fraud prevention, designed from a privacy standpoint, and awarded at the recent PETs Prize Challenges. We leverage latent embedded representations of varied-length transaction sequences, along with local differential privacy, in order to construct a data release mechanism which can securely inform externally hosted fraud and anomaly detection models. We assess our contribution on two distributed data sets donated by large payment networks, and demonstrate robustness to popular inference-time attacks, along with utility-privacy trade-offs analogous to published work in alternative application domains.

preprint2024arXiv

Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences

Machine learning models underpin many modern financial systems for use cases such as fraud detection and churn prediction. Most are based on supervised learning with hand-engineered features, which relies heavily on the availability of labelled data. Large self-supervised generative models have shown tremendous success in natural language processing and computer vision, yet so far they haven't been adapted to multivariate time series of financial transactions. In this paper, we present a generative pretraining method that can be used to obtain contextualised embeddings of financial transactions. Benchmarks on public datasets demonstrate that it outperforms state-of-the-art self-supervised methods on a range of downstream tasks. We additionally perform large-scale pretraining of an embedding model using a corpus of data from 180 issuing banks containing 5.1 billion transactions and apply it to the card fraud detection problem on hold-out datasets. The embedding model significantly improves value detection rate at high precision thresholds and transfers well to out-of-domain distributions.

preprint2015arXiv

Validation of Bayesian posterior distributions using a multidimensional Kolmogorov--Smirnov test

We extend the Kolmogorov--Smirnov (K-S) test to multiple dimensions by suggesting a $\mathbb{R}^n \rightarrow [0,1]$ mapping based on the probability content of the highest probability density region of the reference distribution under consideration; this mapping reduces the problem back to the one-dimensional case to which the standard K-S test may be applied. The universal character of this mapping also allows us to introduce a simple, yet general, method for the validation of Bayesian posterior distributions of any dimensionality. This new approach goes beyond validating software implementations; it provides a sensitive test for all assumptions, explicit or implicit, that underlie the inference. In particular, the method assesses whether the inferred posterior distribution is a truthful representation of the actual constraints on the model parameters. We illustrate our multidimensional K-S test by applying it to a simple two-dimensional Gaussian toy problem, and demonstrate our method for posterior validation in the real-world astrophysical application of estimating the physical parameters of galaxy clusters parameters from their Sunyaev--Zel'dovich effect in microwave background data. In the latter example, we show that the method can validate the entire Bayesian inference process across a varied population of objects for which the derived posteriors are different in each case.

preprint2009arXiv

Impact of modulation on CMB B-mode polarization experiments

We investigate the impact of both slow and fast polarization modulation strategies on the science return of upcoming ground-based experiments aimed at measuring the B-mode polarization of the CMB. Using simulations of the Clover experiment, we compare the ability of modulated and un-modulated observations to recover the signature of gravitational waves in the polarized CMB sky in the presence of a number of anticipated systematic effects. The general expectations that fast modulation is helpful in mitigating low-frequency detector noise, and that the additional redundancy in the projection of the instrument's polarization sensitivity directions onto the sky when modulating reduces the impact of instrumental polarization, are borne out by our simulations. Neither low-frequency polarized atmospheric fluctuations nor systematic errors in the polarization sensitivity directions are mitigated by modulation. Additionally, we find no significant reduction in the effect of pointing errors by modulation. For a Clover-like experiment, pointing jitter should be negligible but any systematic mis-calibration of the polarization coordinate reference system results in significant E-B mixing on all angular scales and will require careful control. We also stress the importance of combining data from multiple detectors in order to remove the effects of common-mode systematics (such as 1/f atmospheric noise) on the measured polarization signal. Finally we compare the performance of our simulated experiment with the predicted performance from a Fisher analysis. We find good agreement between the Fisher predictions and the simulations except for the very largest scales where the power spectrum estimator we have used introduces additional variance to the B-mode signal recovered from our simulations.