Source author record

William Brown

William Brown appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Science and Game Theory Computer Vision Databases Machine Learning Performance Social and Information Networks Software Engineering

Catalog footprint

What is connected

4works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks

Evaluating large language models (LLMs) for medical applications remains challenging due to benchmark saturation, limited data accessibility, and insufficient coverage of relevant tasks. Existing suites have either saturated, heavily depend on restricted datasets, or lack comprehensive model coverage. We introduce Medmarks, a fully open-source evaluation suite with 30 benchmarks spanning question answering, information extraction, medical calculations, and open-ended clinical reasoning. We perform a systematic evaluation of 61 models across 71 configurations using verifiable metrics and LLM-as-a-Judge. Our results show that frontier reasoning models (Gemini 3 Pro Preview, GPT-5.1, & GPT-5.2) achieve the highest performance across both benchmarks, most frontier proprietary models are significantly more token efficient than open-weight alternatives, medically fine-tuned models outperform their generalist counterparts, and that models are susceptible to answer-order bias (particularly smaller models and Grok 4). A subset of our evals (Medmarks-T) can be directly used as reinforcement learning environments to post-train LLMs for medical reasoning. Code is available at https://github.com/MedARC-AI/Medmarks

preprint2020arXiv

Change Point Detection in Software Performance Testing

We describe our process for automatic detection of performance changes for a software product in the presence of noise. A large collection of tests run periodically as changes to our software product are committed to our source repository, and we would like to identify the commits responsible for performance regressions. Previously, we relied on manual inspection of time series graphs to identify significant changes. That was later replaced with a threshold-based detection system, but neither system was sufficient for finding changes in performance in a timely manner. This work describes our recent implementation of a change point detection system built upon the E-Divisive means algorithm. The algorithm produces a list of change points representing significant changes from a given history of performance results. A human reviews the list of change points for actionable changes, which are then triaged for further inspection. Using change point detection has had a dramatic impact on our ability to detect performance changes. Quantitatively, it has dramatically dropped our false positive rate for performance changes, while qualitatively it has made the entire performance evaluation process easier, more productive (ex. catching smaller regressions), and more timely.

preprint2020arXiv

Targeted Intervention in Random Graphs

We consider a setting where individuals interact in a network, each choosing actions which optimize utility as a function of neighbors' actions. A central authority aiming to maximize social welfare at equilibrium can intervene by paying some cost to shift individual incentives, and the optimal intervention can be computed using the spectral decomposition of the graph, yet this is infeasible in practice if the adjacency matrix is unknown. In this paper, we study the question of designing intervention strategies for graphs where the adjacency matrix is unknown and is drawn from some distribution. For several commonly studied random graph models, we show that there is a single intervention, proportional to the first eigenvector of the expected adjacency matrix, which is near-optimal for almost all generated graphs when the budget is sufficiently large. We also provide several efficient sampling-based approaches for approximately recovering the first eigenvector when we do not know the distribution. On the whole, our analysis compares three categories of interventions: those which use no data about the network, those which use some data (such as distributional knowledge or queries to the graph), and those which are fully optimal. We evaluate these intervention strategies on synthetic and real-world network data, and our results suggest that analysis of random graph models can be useful for determining when certain heuristics may perform well in practice.

preprint2014arXiv

Painting Analysis Using Wavelets and Probabilistic Topic Models

In this paper, computer-based techniques for stylistic analysis of paintings are applied to the five panels of the 14th century Peruzzi Altarpiece by Giotto di Bondone. Features are extracted by combining a dual-tree complex wavelet transform with a hidden Markov tree (HMT) model. Hierarchical clustering is used to identify stylistic keywords in image patches, and keyword frequencies are calculated for sub-images that each contains many patches. A generative hierarchical Bayesian model learns stylistic patterns of keywords; these patterns are then used to characterize the styles of the sub-images; this in turn, permits to discriminate between paintings. Results suggest that such unsupervised probabilistic topic models can be useful to distill characteristic elements of style.