Researcher profile

Matt Thomson

Matt Thomson contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router

Social Determinants of Health (SDOH), also known as Health-Related Social Needs (HSRN), play a significant role in patient health outcomes. The Centers for Disease Control and Prevention (CDC) introduced a subset of ICD-10 codes called Z-codes to recognize and measure SDOH. However, Z-codes are infrequently coded in a patient's Electronic Health Record (EHR), and instead, in many cases, need to be inferred from clinical notes. Previous research has shown that large language models (LLMs) show promise on extracting unstructured data from EHRs, but it can be difficult to identify a single model that performs best on varied coding tasks. Further, clinical notes contain protected health information posing a challenge for the use of closed-source language models from commercial vendors. The identification of open-source LLMs that can be run within health organizations and exhibit high performance on SDOH tasks is an important issue to solve. Here, we introduce an intelligent routing system for SDOH coding that uses a language model router to direct medical record data to open-source LLMs that demonstrate optimal performance on specific SDOH codes. This intelligent routing system exhibits state of the art performance of 96.4% accuracy averaged across 13 codes, including homelessness and food insecurity, outperforming closed models such as GPT-4o. We leveraged a publicly-available, deidentified dataset of medical record notes to run the router, but we also introduce a synthetic data generation and validation paradigm to increase the scale of training data without needing privacy-protected medical records. Together, we demonstrate an architecture for intelligent routing of inputs to task-optimal language models to achieve high performance across a set of medical coding sub-tasks.

preprint2022arXiv

Active feature selection discovers minimal gene sets for classifying cell types and disease states with single-cell mRNA-seq data

Sequencing costs currently prohibit the application of single-cell mRNA-seq to many biological and clinical analyses. Targeted single-cell mRNA-sequencing reduces sequencing costs by profiling reduced gene sets that capture biological information with a minimal number of genes. Here, we introduce an active learning method (ActiveSVM) that identifies minimal but highly-informative gene sets that enable the identification of cell-types, physiological states, and genetic perturbations in single-cell data using a small number of genes. Our active feature selection procedure generates minimal gene sets from single-cell data through an iterative cell-type classification task where misclassified cells are examined at each round of analysis to identify maximally informative genes through an `active' support vector machine (ActiveSVM) classifier. By focusing computational resources on misclassified cells, ActiveSVM scales to analyze data sets with over a million single cells. We demonstrate that ActiveSVM feature selection identifies gene sets that enable ~90% cell-type classification accuracy across a variety of data sets including cell atlas and disease characterization data sets. The method generalizes to reveal genes that respond to genetic perturbations and to identify region specific gene expression patterns in spatial transcriptomics data. The discovery of small but highly informative gene sets should enable substantial reductions in the number of measurements necessary for application of single-cell mRNA-seq to clinical tests, therapeutic discovery, and genetic screens.

preprint2022arXiv

Spatiotemporal patterning of extensile active stresses in microtubule-based active fluids

Active stresses, which are collectively generated by the motion of energy-consuming rod-like constituents, generate chaotic autonomous flows. Controlling active stresses in space and time is an essential prerequisite for controlling the intrinsically chaotic dynamics of extensile active fluids. We design single-headed kinesin molecular motors that exhibit optically enhanced clustering, and thus enable precise and repeatable spatial and temporal control of extensile active stresses. Such motors enable rapid, reversible switching between flowing and quiescent states. In turn, spatio-temporal patterning of the active stress controls the evolution of the ubiquitous bend-instability of extensile active fluids and determines its critical length dependence. Combining optically controlled clusters with conventional kinesin motors enables one-time switching from contractile to extensile active stresses. These results open a path towards real-time control of the autonomous flows generated by active fluids.

preprint2022arXiv

Therapeutic algebra of immunomodulatory drug responses at single-cell resolution

Therapeutic modulation of immune states is central to the treatment of human disease. However, how drugs and drug combinations impact the diverse cell types in the human immune system remains poorly understood at the transcriptome scale. Here, we apply single-cell mRNA-seq to profile the response of human immune cells to 502 immunomodulatory drugs alone and in combination. We develop a unified mathematical model that quantitatively describes the transcriptome scale response of myeloid and lymphoid cell types to individual drugs and drug combinations through a single inferred regulatory network. The mathematical model reveals how drug combinations generate novel, macrophage and T-cell states by recruiting combinations of gene expression programs through both additive and non-additive drug interactions. A simplified drug response algebra allows us to predict the continuous modulation of immune cell populations between activated, resting and hyper-inhibited states through combinatorial drug dose titrations. Our results suggest that transcriptome-scale mathematical models could enable the design of therapeutic strategies for programming the human immune system using combinations of therapeutics.

preprint2021arXiv

Programming Boundary Deformation Patterns in Active Networks

Active materials take advantage of their internal sources of energy to self-organize in an automated manner. This feature provides a novel opportunity to design micron-scale machines with minimal required control. However, self-organization goes hand in hand with predetermined dynamics that are hardly susceptible to environmental perturbations. Therefore utilizing this feature of active systems requires harnessing and directing the macroscopic dynamics to achieve specific functions; which in turn necessitates understanding the underlying mechanisms of active forces. Here we devise an optical control protocol to engineer the dynamics of active networks composed of microtubules and light-activatable motor proteins. The protocol enables carving activated networks of different shapes, and isolating them from the embedding solution. Studying a large set of shapes, we observe that the active networks contract in a shape-preserving manner that persists over the course of contraction. We formulate a coarse-grained theory and demonstrate that self-similarity of contraction is associated with viscous-like active stresses. These findings help us program the dynamics of the network through manipulating the light intensity in space and time, and maneuver the network into bending in specific directions, as well as temporally alternating directions. Our work improves understanding the active dynamics in contractile networks, and paves a new path towards engineering the dynamics of a large class of active materials.

preprint2020arXiv

Geometric algorithms for predicting resilience and recovering damage in neural networks

Biological neural networks have evolved to maintain performance despite significant circuit damage. To survive damage, biological network architectures have both intrinsic resilience to component loss and also activate recovery programs that adjust network weights through plasticity to stabilize performance. Despite the importance of resilience in technology applications, the resilience of artificial neural networks is poorly understood, and autonomous recovery algorithms have yet to be developed. In this paper, we establish a mathematical framework to analyze the resilience of artificial neural networks through the lens of differential geometry. Our geometric language provides natural algorithms that identify local vulnerabilities in trained networks as well as recovery algorithms that dynamically adjust networks to compensate for damage. We reveal striking vulnerabilities in commonly used image analysis networks, like MLP's and CNN's trained on MNIST and CIFAR10 respectively. We also uncover high-performance recovery paths that enable the same networks to dynamically re-adjust their parameters to compensate for damage. Broadly, our work provides procedures that endow artificial systems with resilience and rapid-recovery routines to enhance their integration with IoT devices as well as enable their deployment for critical applications.

preprint2020arXiv

Self-organization of multi-layer spiking neural networks

Living neural networks in our brains autonomously self-organize into large, complex architectures during early development to result in an organized and functional organic computational device. A key mechanism that enables the formation of complex architecture in the developing brain is the emergence of traveling spatio-temporal waves of neuronal activity across the growing brain. Inspired by this strategy, we attempt to efficiently self-organize large neural networks with an arbitrary number of layers into a wide variety of architectures. To achieve this, we propose a modular tool-kit in the form of a dynamical system that can be seamlessly stacked to assemble multi-layer neural networks. The dynamical system encapsulates the dynamics of spiking units, their inter/intra layer interactions as well as the plasticity rules that control the flow of information between layers. The key features of our tool-kit are (1) autonomous spatio-temporal waves across multiple layers triggered by activity in the preceding layer and (2) Spike-timing dependent plasticity (STDP) learning rules that update the inter-layer connectivity based on wave activity in the connecting layers. Our framework leads to the self-organization of a wide variety of architectures, ranging from multi-layer perceptrons to autoencoders. We also demonstrate that emergent waves can self-organize spiking network architecture to perform unsupervised learning, and networks can be coupled with a linear classifier to perform classification on classic image datasets like MNIST. Broadly, our work shows that a dynamical systems framework for learning can be used to self-organize large computational devices.