Researcher profile

Emily Huang

Emily Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

A pervasive intuition holds that vision-language models (VLMs) are most trustworthy when their attention maps look sharp: concentrated attention on the queried region should imply a confident, calibrated answer. We test this Attention-Confidence Assumption directly. We instrument three open-weight VLM families (LLaVA-1.5, PaliGemma, Qwen2-VL; 3-7B parameters) with a unified mechanistic pipeline -- the VLM Reliability Probe (VRP) -- that compares attention structure, generation dynamics, and hidden-state geometry against a single correctness label. Three results emerge. (i) Attention structure is a near-zero predictor of correctness (R_pb(C_k,y)=0.001, 95% CI [-0.034,0.036]; R_pb(H_s,y)=-0.012, [-0.047,0.024] on a pooled n=3,090 split), even though attention remains causally necessary for feature extraction (top-30% patch masking drops accuracy by 8.2-11.3 pp, p<0.001). (ii) Reliability becomes legible later in the computation: a single hidden-state linear probe reaches AUROC>0.95 on POPE for two of three families, and self-consistency at K=10 is the strongest behavioral predictor we measure at 10x inference cost (R_pb=0.43). (iii) Causal neuron-level ablations expose a sharp architectural split with direct monitor-design implications: late-fusion LLaVA concentrates reliability in a fragile late bottleneck (-8.3 pp object-identification accuracy after top-5 probe-neuron ablation), whereas early-fusion PaliGemma and Qwen2-VL distribute it widely and absorb destruction of ~50% of their peak-layer hidden dimension with <=1 pp degradation. The takeaway is narrow but consequential: in 3-7B VLMs, reliability is read more reliably off hidden-state geometry, layer-wise margin formation, and sparse late-layer circuits than off attention-map sharpness.

preprint2022arXiv

Combining Accelerometer and Gyroscope Data in Smartphone-Based Activity Recognition using Movelets

Physical activity patterns can be informative about a patient&#39;s health status. Traditionally, activity data have been gathered using patient self-report. However, these subjective data can suffer from bias and are difficult to collect over long time periods. Smartphones offer an opportunity to address these challenges. The smartphone has built-in sensors that can be programmed to collect data objectively, unobtrusively, and continuously. Due to their widespread adoption, smartphones are also accessible to most of the population. A main challenge in smartphone-based activity recognition is extracting information optimally from multiple sensors to identify the unique features of different activities. In our study, we analyze data collected by the accelerometer and gyroscope, which measure the phone&#39;s acceleration and angular velocity, respectively. We propose an extension to the &#34;movelet method&#34; that jointly incorporates both sensors. We also apply this joint-sensor method to a data set we collected previously. The findings show that combining data from the two sensors can result in more accurate activity recognition than using each sensor alone. For example, the joint-sensor method reduces errors of the gyroscope-only method in differentiating between standing and sitting. It also reduces errors of the accelerometer-only method in classifying vigorous activities.