Researcher profile

Michael Zhang

Michael Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Democratizing the medieval English legal tradition

The record of the beginning of the most widespread legal system in the world is contained in millions of pages of handwritten text. Most of the records of the first centuries of the Anglo-American legal system are hand-written in a highly abbreviated form of medieval Latin which only a few dozen scholars in the world are trained to read. In this interdisciplinary project, we construct a dataset of 4029 lines of text across 193 medieval criminal and civil cases. We then use the dataset to train an open-source end-to-end pipeline for transcribing these manuscripts. We first train standard neural network architectures for line segmentation and handwriting recognition (R-Blla and CNN+LSTM with CTC decoding, respectively) and show that they can already achieve 79% word accuracy, despite the relatively small training set and the challenge of expanding abbreviations. We then demonstrate that simple post-processing significantly boosts accuracy: adding an n-gram language model to the CTC decoder improves word accuracy to 82%, while asking Gemini Pro 3 to correct mistakes boosts accuracy to 88%. Finally, we compare the CNN+LSTM architecture with TrOCR, a transformer-based OCR architecture, demonstrating that TrOCR shows comparable word accuracy but worse character accuracy due to its over-willingness to guess, making it harder for humans to infer the correct reading. We incorporated our pipeline into a web portal (glyphmachina.com), opening up the English legal tradition to legal scholars, medievalists, and students.

preprint2026arXiv

Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs

Steering large language models (LLMs) is usually done by either instruction prompting or activation steering. Prompting often gives strong control, but caches guidance tokens at every layer and can clutter long interactions; activation steering is compact but typically weaker and does not support large structured reminders. We introduce memory inception (MI), a training-free method that steers in latent attention space by inserting text-derived key-value (KV) banks only at selected layers. Rather than materializing reminder content throughout the prompt cache, MI treats steering as selective KV allocation, injecting latent slots only where the model routes to them. On matched personality-steering tasks, MI gives the best overall control--drift trade-off, remaining competitive with prompting while consistently outperforming CAA. On updateable guidance, MI supports mid-conversation behavior shifts without rewriting the visible transcript, achieving the highest post-shift alignment on Qwen3. On structured reasoning, MI outperforms visible prompting on HARDMath and PHYSICS (10/12 subject$\times$mode cells), serving as proxies for structured reasoning in verifiable domains, while cutting content-matched KV storage by up to 118$\times$. These results position MI as a powerful steering method when guidance is persistent, structured, or expensive to keep in the visible transcript.

preprint2022arXiv

Contrastive Adapters for Foundation Model Group Robustness

While large pretrained foundation models (FMs) have shown remarkable zero-shot classification robustness to dataset-level distribution shifts, their robustness to subpopulation or group shifts is relatively underexplored. We study this problem, and find that FMs such as CLIP may not be robust to various group shifts. Across 9 robustness benchmarks, zero-shot classification with their embeddings results in gaps of up to 80.7 percentage points (pp) between average and worst-group accuracy. Unfortunately, existing methods to improve robustness require retraining, which can be prohibitively expensive on large foundation models. We also find that efficient ways to improve model inference (e.g., via adapters, lightweight networks with FM embeddings as inputs) do not consistently improve and can sometimes hurt group robustness compared to zero-shot (e.g., increasing the accuracy gap by 50.1 pp on CelebA). We thus develop an adapter training strategy to effectively and efficiently improve FM group robustness. Our motivating observation is that while poor robustness results from groups in the same class being embedded far apart in the foundation model "embedding space," standard adapter training may not bring these points closer together. We thus propose contrastive adapting, which trains adapters with contrastive learning to bring sample embeddings close to both their ground-truth class embeddings and other sample embeddings in the same class. Across the 9 benchmarks, our approach consistently improves group robustness, raising worst-group accuracy by 8.5 to 56.0 pp over zero-shot. Our approach is also efficient, doing so without any FM finetuning and only a fixed set of frozen FM embeddings. On benchmarks such as Waterbirds and CelebA, this leads to worst-group accuracy comparable to state-of-the-art methods that retrain entire models, while only training $\leq$1% of the model parameters.

preprint2022arXiv

Non-detection of He I in the atmosphere of GJ1214b with Keck/NIRSPEC, at a time of minimal telluric contamination

Observations of helium in exoplanet atmospheres may reveal the presence of large gaseous envelopes, and indicate ongoing atmospheric escape. Orell-Miquel et al. (2022) used CARMENES to measure a tentative detection of helium for the sub-Neptune GJ 1214b, with a peak excess absorption reaching over 2% in transit depth at 10830 Angstroms. However, several non-detections of helium had previously been reported for GJ 1214b. One explanation for the discrepancy was contamination of the planetary signal by overlapping telluric absorption- and emission lines. We used Keck/NIRSPEC to observe another transit of GJ 1214b at 10830 Angstroms, at a time of minimal contamination by telluric lines, and did not observe planetary helium absorption. Accounting for correlated noise in our measurement, we place an upper limit on the excess absorption size of 1.22% (95% confidence). We find that the discrepancy between the CARMENES and NIRSPEC observations is unlikely to be caused by using different instruments or stellar activity. It is currently unclear whether the difference is due to correlated noise in the observations, or variability in the planetary atmosphere.

preprint2022arXiv

On the Opportunities and Risks of Foundation Models

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

preprint2022arXiv

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them, but the precise mechanism is poorly understood. We argue that creating spread alone is insufficient for better representations, since spread is invariant to permutations within classes. Instead, both the correct degree of spread and a mechanism for breaking this invariance are necessary. We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread. Next, we study three mechanisms to break permutation invariance: using a constrained encoder, adding a class-conditional autoencoder, and using data augmentation. We show that the latter two encourage clustering of latent subclasses under more realistic conditions than the former. Using these insights, we show that adding a properly-weighted class-conditional InfoNCE loss and a class-conditional autoencoder to SupCon achieves 11.1 points of lift on coarse-to-fine transfer across 5 standard datasets and 4.7 points on worst-group robustness on 3 datasets, setting state-of-the-art on CelebA by 11.5 points.

preprint2022arXiv

Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudolabels. The challenge is building a combination that best exploits the signal available in both foundation models and weak sources. We propose Liger, a combination that uses foundation model embeddings to improve two crucial elements of existing weak supervision techniques. First, we produce finer estimates of weak source quality by partitioning the embedding space and learning per-part source accuracies. Second, we improve source coverage by extending source votes in embedding space. Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space. On six benchmark NLP and video tasks, Liger outperforms vanilla weak supervision by 14.1 points, weakly-supervised kNN and adapters by 11.8 points, and kNN and adapters supervised by traditional hand labels by 7.2 points.

preprint2022arXiv

Triangle and Four Cycle Counting with Predictions in Graph Streams

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms that did not use oracles. In this paper, we explore the power of a "heavy edge" oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.

preprint2021arXiv

Detection of Ongoing Mass Loss from HD 63433c, a Young Mini Neptune

We detect Lyman $α$ absorption from the escaping atmosphere of HD 63433c, a $R=2.67 R_\oplus$, $P=20.5$ d mini Neptune orbiting a young (440 Myr) solar analogue in the Ursa Major Moving Group. Using HST/STIS, we measure a transit depth of $11.1 \pm 1.5$% in the blue wing and $8 \pm 3$% in the red. This signal is unlikely to be due to stellar variability, but should be confirmed by an upcoming second visit with HST. We do not detect Lyman $α$ absorption from the inner planet, a smaller $R=2.15 R_\oplus$ mini Neptune on a 7.1 d orbit. We use Keck/NIRSPEC to place an upper limit of 0.5% on helium absorption for both planets. We measure the host star's X-ray spectrum and FUV flux with XMM-Newton, and model the outflow from both planets using a 3D hydrodynamic code. This model provides a reasonable match to the light curve in the blue wing of the Lyman $α$ line and the helium non-detection for planet c, although it does not explain the tentative red wing absorption or reproduce the excess absorption spectrum in detail. Its predictions of strong Lyman $α$ and helium absorption from b are ruled out by the observations. This model predicts a much shorter mass loss timescale for planet b, suggesting that b and c are fundamentally different: while the latter still retains its hydrogen/helium envelope, the former has likely lost its primordial atmosphere.

preprint2021arXiv

Escaping Helium from TOI 560.01, a Young Mini Neptune

We report helium absorption from the escaping atmosphere of TOI 560.01 (HD 73583b), a $R=2.8 R_\oplus$, $P=6.4$ d mini Neptune orbiting a young ($\sim$600 Myr) K dwarf. Using Keck/NIRSPEC, we detect a signal with an average depth of $0.68 \pm 0.08$% in the line core. The absorption signal repeats during a partial transit obtained a month later, but is marginally stronger and bluer, perhaps reflecting changes in the stellar wind environment. Ingress occurs on time, and egress occurs within 12 minutes of the white light egress, although absorption rises more gradually than it declines. This suggests that the outflow is slightly asymmetric and confined to regions close to the planet. The absorption signal also exhibits a slight 4 km/s redshift rather than the expected blueshift; this might be explained if the planet has a modest orbital eccentricity, although the radial velocity data disfavors such an explanation. We use XMM-Newton observations to reconstruct the high energy stellar spectrum and model the planet's outflow with 1D and 3D hydrodynamic simulations. We find that our models generally overpredict the measured magnitude of the absorption during transit, the size of the blueshift, or both. Increasing the metallicity to 100$\times$ solar suppresses the signal, but the dependence of the predicted signal strength on metallicity is non-monotonic. Decreasing the assumed stellar EUV flux by a factor of 3 likewise suppresses the signal substantially.

preprint2020arXiv

PLATON II: New Capabilities And A Comprehensive Retrieval on HD 189733b Transit and Eclipse Data

Recently, we introduced PLanetary Atmospheric Tool for Observer Noobs (PLATON), a Python package that calculates model transmission spectra for exoplanets and retrieves atmospheric characteristics based on observed spectra. We now expand its capabilities to include the ability to compute secondary eclipse depths. We have also added the option to calculate models using the correlated-$k$ method for radiative transfer, which improves accuracy without sacrificing speed. Additionally, we update the opacities in PLATON--many of which were generated using old or proprietary line lists--using the most recent and complete public line lists. These opacities are made available at R=1000 and R=10,000 over the 0.3--30 um range, and at R=375,000 in select near IR bands, making it possible to utilize PLATON for ground-based high resolution cross correlation studies. To demonstrate PLATON's new capabilities, we perform a retrieval on published HST and Spitzer transmission and emission spectra of the archetypal hot Jupiter HD 189733b. This is the first joint transit and secondary eclipse retrieval for this planet in the literature, as well as the most comprehensive set of both transit and eclipse data assembled for a retrieval to date. We find that these high signal-to-noise data are well-matched by atmosphere models with a C/O ratio of $0.66_{-0.09}^{+0.05}$ and a metallicity of $12_{-5}^{+8}$ times solar where the terminator is dominated by extended nanometer-sized haze particles at optical wavelengths. These are among the smallest uncertainties reported to date for an exoplanet, demonstrating both the power and the limitations of HST and Spitzer exoplanet observations.