Source author record

Tucker Balch

Tucker Balch appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Multiagent Systems Computational Engineering, Finance, and Science econ.GN Human-Computer Interaction q-fin.EC q-fin.MF Social and Information Networks

Catalog footprint

What is connected

7works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Downstream Task-Oriented Generative Model Selections on Synthetic Data Training for Fraud Detection Models

Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance. Existing studies focused on the utility of a single family of generative models. They provided limited insights on how synthetic data practitioners select the best family generative models for synthetic training tasks given a specific combination of machine learning model class and performance metric. In this paper, we approach the downstream task-oriented generative model selections problem in the case of training fraud detection models and investigate the best practice given different combinations of model interpretability and model performance constraints. Our investigation supports that, while both Neural Network(NN)-based and Bayesian Network(BN)-based generative models are both good to complete synthetic training task under loose model interpretability constrain, the BN-based generative models is better than NN-based when synthetic training fraud detection model under strict model interpretability constrain. Our results provides practical guidance for machine learning practitioner who is interested in replacing their training dataset from real to synthetic, and shed lights on more general downstream task-oriented generative model selection problems.

preprint2024arXiv

Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections

Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.

preprint2023arXiv

Once Burned, Twice Shy? The Effect of Stock Market Bubbles on Traders that Learn by Experience

We study how experience with asset price bubbles changes the trading strategies of reinforcement learning (RL) traders and ask whether the change in trading strategies helps to prevent future bubbles. We train the RL traders in a multi-agent market simulation platform, ABIDES, and compare the strategies of traders trained with and without bubble experience. We find that RL traders without bubble experience behave like short-term momentum traders, whereas traders with bubble experience behave like value traders. Therefore, RL traders without bubble experience amplify bubbles, whereas RL traders with bubble experience tend to suppress and sometimes prevent them. This finding suggests that learning from experience is a mechanism for a boom and bust cycle where the experience of a collapsing bubble makes future bubbles less likely for a period of time until the memory fades and bubbles become more likely to form again.

preprint2022arXiv

CTMSTOU driven markets: simulated environment for regime-awareness in trading policies

Market regimes is a popular topic in quantitative finance even though there is little consensus on the details of how they should be defined. They arise as a feature both in financial market prediction problems and financial market task performing problems. In this work we use discrete event time multi-agent market simulation to freely experiment in a reproducible and understandable environment where regimes can be explicitly switched and enforced. We introduce a novel stochastic process to model the fundamental value perceived by market participants: Continuous-Time Markov Switching Trending Ornstein-Uhlenbeck (CTMSTOU), which facilitates the study of trading policies in regime switching markets. We define the notion of regime-awareness for a trading agent as well and illustrate its importance through the study of different order placement strategies in the context of order execution problems.

preprint2022arXiv

Differentially Private Learning of Hawkes Processes

Hawkes processes have recently gained increasing attention from the machine learning community for their versatility in modeling event sequence data. While they have a rich history going back decades, some of their properties, such as sample complexity for learning the parameters and releasing differentially private versions, are yet to be thoroughly analyzed. In this work, we study standard Hawkes processes with background intensity $μ$ and excitation function $αe^{-βt}$. We provide both non-private and differentially private estimators of $μ$ and $α$, and obtain sample complexity results in both settings to quantify the cost of privacy. Our analysis exploits the strong mixing property of Hawkes processes and classical central limit theorem results for weakly dependent random variables. We validate our theoretical findings on both synthetic and real datasets.

preprint2020arXiv

Some people aren't worth listening to: periodically retraining classifiers with feedback from a team of end users

Document classification is ubiquitous in a business setting, but often the end users of a classifier are engaged in an ongoing feedback-retrain loop with the team that maintain it. We consider this feedback-retrain loop from a multi-agent point of view, considering the end users as autonomous agents that provide feedback on the labelled data provided by the classifier. This allows us to examine the effect on the classifier's performance of unreliable end users who provide incorrect feedback. We demonstrate a classifier that can learn which users tend to be unreliable, filtering their feedback out of the loop, thus improving performance in subsequent iterations.

preprint2014arXiv

Inferring Social Structure and Dominance Relationships Between Rhesus macaques using RFID Tracking Data

In this paper we address the problem of inferring social structure and dominance relationships in a group of rhesus macaques (a species of monkey) using only position data captured using RFID tags. Automatic inference of the social structure in an animal group enables a number of important capabilities, including: 1) A verifiable measure of how the social structure is affected by an intervention such as a change in the environment, or the introduction of another animal, and 2) A potentially significant reduction in person hours normally used for assessing these changes. Social structure in a group is an important indicator of its members' relative level of access to resources and has interesting implications for an individual's health and learning in groups. There are two main quantitative criteria assessed in order to infer the social structure; Time spent close to conspecifics, and displacements. An interaction matrix is used to represent the total duration of events detected as grooming behavior between any two monkeys. This forms an undirected tie-strength (closeness of relationships) graph. A directed graph of hierarchy is constructed by using the well cited assumption of a linear hierarchy for rhesus macaques. Events that contribute to the adjacency matrix for this graph are withdrawals or displacements where a lower ranked monkey moves away from a higher ranked monkey. Displacements are one of the observable behaviors that can act as a strong indication of tie-strength and dominance. To quantify the directedness of interaction during these events we construct histograms of the dot products of motion orientation and relative position. This gives us a measure of how much time a monkey spends in moving towards or away from other group members.