Source author record

Vincent Wang

Vincent Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language math.CT Machine Learning quant-ph Artificial Intelligence Distributed, Parallel, and Cluster Computing Logic in Computer Science Molecular Networks Quantitative Methods

Catalog footprint

What is connected

5works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only handoff reduces the measured step by 18.3x on a 4B dense model and 2.85x on a 30B MoE, while concurrent multi-policy GRPO shortens wall time by 1.77x and 1.45x without raising peak memory. Scale Out separates durable policy addressability from CPU/GPU working sets: a tensor-parallel deployment supports 10^6-scale addressable catalogs (measured single-engine sweeps through 100K) and thousand-adapter active waves at cluster scale, with cold loading treated as scheduled service work and packed MoE LoRA tensors improving live engine loading by 8.5-8.7x. MinT thus manages million-scale LoRA policy catalogs while training and serving selected adapter revisions over shared 1T-class base models.

preprint2026arXiv

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

Reinforcement Learning with Verifiable Rewards (RLVR) is highly effective for enhancing LLM reasoning, yet recent evidence shows models like Qwen 2.5 achieve significant gains even with spurious or incorrect rewards. We investigate this phenomenon and identify a "Perplexity Paradox": spurious RLVR triggers a divergence where answer-token perplexity drops while prompt-side coherence degrades, suggesting the model is bypassing reasoning in favor of memorization. Using Path Patching, Logit Lens, JSD analysis, and Neural Differential Equations, we uncover a hidden Anchor-Adapter circuit that facilitates this shortcut. We localize a Functional Anchor in the middle layers (L18-20) that triggers the retrieval of memorized solutions, followed by Structural Adapters in later layers (L21+) that transform representations to accommodate the shortcut signal. Finally, we demonstrate that scaling specific MLP keys within this circuit allows for bidirectional causal steering-artificially amplifying or suppressing contamination-driven performance. Our results provide a mechanistic roadmap for identifying and mitigating data contamination in RLVR-tuned models. Code is available at https://github.com/idwts/How-RLVR-Activates-Memorization-Shortcuts.

preprint2021arXiv

The Safari of Update Structures: Visiting the Lens and Quantum Enclosures

We build upon our recently introduced concept of an update structure to show that it is a generalisation of very-well-behaved lenses, that is, there is a bijection between a strict subset of update structures and vwb lenses in cartesian categories. We show that update structures are also sufficiently general to capture quantum observables, pinpointing the additional assumptions required to make the two coincide. In doing so, we shift the focus from special commutative dagger-Frobenius algebras to interacting (co)magma (co)module pairs, showing that the algebraic properties of the (co)multiplication arise from the module-comodule interaction, rather than direct assumptions about the magma-comagma pair. We then begin to investigate the zoo of possible update structures, introducing the notions of classical security-flagged databases, and databases of quantum systems. This work is of foundational interest as update structures place previously distinct areas of research in a general class of operationally motivated structures, we expect the taming of this class to illuminate novel relationships between separately studied topics in computer science, physics and mathematics.

preprint2020arXiv

Categories of Semantic Concepts

Modelling concept representation is a foundational problem in the study of cognition and linguistics. This work builds on the confluence of conceptual tools from Gärdenfors semantic spaces, categorical compositional linguistics, and applied category theory to present a domain-independent and categorical formalism of 'concept'.

preprint2020arXiv

Towards a Scientific Method for Dynamical Systems

The premise of this paper is the following value proposition: Models are good when they describe a system of phenomena, and they are better when they can predict the effect of interventions upon the system. We introduce a formalism by which dynamical system models can be obtained by specifying basic constituent processes in the form of petri nets, mediated by the Law of Mass Action. We prove the universality of this procedure with respect to ordinary first order differential equations in rational functions. In addition, we introduce a simple graphical procedure for calculating dynamical systems from petri nets in our formalism, which we use to recover several well-known dynamical systems. For proof-of-concept, we use our method to obtain a differential equation model for the HES1 gene transcription network that predicts post-intervention behaviours of the network.

Vincent Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

The Safari of Update Structures: Visiting the Lens and Quantum Enclosures

Categories of Semantic Concepts

Towards a Scientific Method for Dynamical Systems