Researcher profile

Vincent Wang

Vincent Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only handoff reduces the measured step by 18.3x on a 4B dense model and 2.85x on a 30B MoE, while concurrent multi-policy GRPO shortens wall time by 1.77x and 1.45x without raising peak memory. Scale Out separates durable policy addressability from CPU/GPU working sets: a tensor-parallel deployment supports 10^6-scale addressable catalogs (measured single-engine sweeps through 100K) and thousand-adapter active waves at cluster scale, with cold loading treated as scheduled service work and packed MoE LoRA tensors improving live engine loading by 8.5-8.7x. MinT thus manages million-scale LoRA policy catalogs while training and serving selected adapter revisions over shared 1T-class base models.

preprint2026arXiv

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

Reinforcement Learning with Verifiable Rewards (RLVR) is highly effective for enhancing LLM reasoning, yet recent evidence shows models like Qwen 2.5 achieve significant gains even with spurious or incorrect rewards. We investigate this phenomenon and identify a "Perplexity Paradox": spurious RLVR triggers a divergence where answer-token perplexity drops while prompt-side coherence degrades, suggesting the model is bypassing reasoning in favor of memorization. Using Path Patching, Logit Lens, JSD analysis, and Neural Differential Equations, we uncover a hidden Anchor-Adapter circuit that facilitates this shortcut. We localize a Functional Anchor in the middle layers (L18-20) that triggers the retrieval of memorized solutions, followed by Structural Adapters in later layers (L21+) that transform representations to accommodate the shortcut signal. Finally, we demonstrate that scaling specific MLP keys within this circuit allows for bidirectional causal steering-artificially amplifying or suppressing contamination-driven performance. Our results provide a mechanistic roadmap for identifying and mitigating data contamination in RLVR-tuned models. Code is available at https://github.com/idwts/How-RLVR-Activates-Memorization-Shortcuts.

preprint2021arXiv

The Safari of Update Structures: Visiting the Lens and Quantum Enclosures

We build upon our recently introduced concept of an update structure to show that it is a generalisation of very-well-behaved lenses, that is, there is a bijection between a strict subset of update structures and vwb lenses in cartesian categories. We show that update structures are also sufficiently general to capture quantum observables, pinpointing the additional assumptions required to make the two coincide. In doing so, we shift the focus from special commutative dagger-Frobenius algebras to interacting (co)magma (co)module pairs, showing that the algebraic properties of the (co)multiplication arise from the module-comodule interaction, rather than direct assumptions about the magma-comagma pair. We then begin to investigate the zoo of possible update structures, introducing the notions of classical security-flagged databases, and databases of quantum systems. This work is of foundational interest as update structures place previously distinct areas of research in a general class of operationally motivated structures, we expect the taming of this class to illuminate novel relationships between separately studied topics in computer science, physics and mathematics.

preprint2020arXiv

Towards a Scientific Method for Dynamical Systems

The premise of this paper is the following value proposition: Models are good when they describe a system of phenomena, and they are better when they can predict the effect of interventions upon the system. We introduce a formalism by which dynamical system models can be obtained by specifying basic constituent processes in the form of petri nets, mediated by the Law of Mass Action. We prove the universality of this procedure with respect to ordinary first order differential equations in rational functions. In addition, we introduce a simple graphical procedure for calculating dynamical systems from petri nets in our formalism, which we use to recover several well-known dynamical systems. For proof-of-concept, we use our method to obtain a differential equation model for the HES1 gene transcription network that predicts post-intervention behaviours of the network.