Source author record

Emily Huang

Emily Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Machine Learning Methodology

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

A pervasive intuition holds that vision-language models (VLMs) are most trustworthy when their attention maps look sharp: concentrated attention on the queried region should imply a confident, calibrated answer. We test this Attention-Confidence Assumption directly. We instrument three open-weight VLM families (LLaVA-1.5, PaliGemma, Qwen2-VL; 3-7B parameters) with a unified mechanistic pipeline -- the VLM Reliability Probe (VRP) -- that compares attention structure, generation dynamics, and hidden-state geometry against a single correctness label. Three results emerge. (i) Attention structure is a near-zero predictor of correctness (R_pb(C_k,y)=0.001, 95% CI [-0.034,0.036]; R_pb(H_s,y)=-0.012, [-0.047,0.024] on a pooled n=3,090 split), even though attention remains causally necessary for feature extraction (top-30% patch masking drops accuracy by 8.2-11.3 pp, p<0.001). (ii) Reliability becomes legible later in the computation: a single hidden-state linear probe reaches AUROC>0.95 on POPE for two of three families, and self-consistency at K=10 is the strongest behavioral predictor we measure at 10x inference cost (R_pb=0.43). (iii) Causal neuron-level ablations expose a sharp architectural split with direct monitor-design implications: late-fusion LLaVA concentrates reliability in a fragile late bottleneck (-8.3 pp object-identification accuracy after top-5 probe-neuron ablation), whereas early-fusion PaliGemma and Qwen2-VL distribute it widely and absorb destruction of ~50% of their peak-layer hidden dimension with <=1 pp degradation. The takeaway is narrow but consequential: in 3-7B VLMs, reliability is read more reliably off hidden-state geometry, layer-wise margin formation, and sparse late-layer circuits than off attention-map sharpness.

preprint2022arXiv

Combining Accelerometer and Gyroscope Data in Smartphone-Based Activity Recognition using Movelets

Physical activity patterns can be informative about a patient's health status. Traditionally, activity data have been gathered using patient self-report. However, these subjective data can suffer from bias and are difficult to collect over long time periods. Smartphones offer an opportunity to address these challenges. The smartphone has built-in sensors that can be programmed to collect data objectively, unobtrusively, and continuously. Due to their widespread adoption, smartphones are also accessible to most of the population. A main challenge in smartphone-based activity recognition is extracting information optimally from multiple sensors to identify the unique features of different activities. In our study, we analyze data collected by the accelerometer and gyroscope, which measure the phone's acceleration and angular velocity, respectively. We propose an extension to the "movelet method" that jointly incorporates both sensors. We also apply this joint-sensor method to a data set we collected previously. The findings show that combining data from the two sensors can result in more accurate activity recognition than using each sensor alone. For example, the joint-sensor method reduces errors of the gyroscope-only method in differentiating between standing and sitting. It also reduces errors of the accelerometer-only method in classifying vigorous activities.