Source author record

Andrew Chen

Andrew Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence astro-ph.IM Biomolecules Computation and Language Distributed, Parallel, and Cluster Computing Populations and Evolution

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only handoff reduces the measured step by 18.3x on a 4B dense model and 2.85x on a 30B MoE, while concurrent multi-policy GRPO shortens wall time by 1.77x and 1.45x without raising peak memory. Scale Out separates durable policy addressability from CPU/GPU working sets: a tensor-parallel deployment supports 10^6-scale addressable catalogs (measured single-engine sweeps through 100K) and thousand-adapter active waves at cluster scale, with cold loading treated as scheduled service work and packed MoE LoRA tensors improving live engine loading by 8.5-8.7x. MinT thus manages million-scale LoRA policy catalogs while training and serving selected adapter revisions over shared 1T-class base models.

preprint2022arXiv

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.

preprint2022arXiv

Spatial Graph Attention and Curiosity-driven Policy for Antiviral Drug Discovery

We developed Distilled Graph Attention Policy Network (DGAPN), a reinforcement learning model to generate novel graph-structured chemical representations that optimize user-defined objectives by efficiently navigating a physically constrained domain. The framework is examined on the task of generating molecules that are designed to bind, noncovalently, to functional sites of SARS-CoV-2 proteins. We present a spatial Graph Attention (sGAT) mechanism that leverages self-attention over both node and edge attributes as well as encoding the spatial structure -- this capability is of considerable interest in synthetic biology and drug discovery. An attentional policy network is introduced to learn the decision rules for a dynamic, fragment-based chemical environment, and state-of-the-art policy gradient techniques are employed to train the network with stability. Exploration is driven by the stochasticity of the action space design and the innovation reward bonuses learned and proposed by random network distillation. In experiments, our framework achieved outstanding results compared to state-of-the-art algorithms, while reducing the complexity of paths to chemical synthesis.

preprint2020arXiv

gSeaGen: the KM3NeT GENIE-based code for neutrino telescopes

The gSeaGen code is a GENIE-based application developed to efficiently generate high statistics samples of events, induced by neutrino interactions, detectable in a neutrino telescope. The gSeaGen code is able to generate events induced by all neutrino flavours, considering topological differences between track-type and shower-like events. Neutrino interactions are simulated taking into account the density and the composition of the media surrounding the detector. The main features of gSeaGen are presented together with some examples of its application within the KM3NeT project.

preprint2013arXiv

Dynamics of a producer-parasite ecosystem on the brink of collapse

Ecosystems can undergo sudden shifts to undesirable states, but recent studies with simple single species ecosystems have demonstrated that advance warning can be provided by the slowing down of population dynamics near a tipping point. However, it is not clear how this effect of critical slowing down will manifest in ecosystems with strong interactions between their components. Here we probe the dynamics of an experimental producer parasite ecosystem as it approaches a catastrophic collapse. Surprisingly, the producer population grows in size as the environment deteriorates, highlighting that population size can be a misleading measure of ecosystem stability. By analyzing the oscillatory producer parasite dynamics for over ~100 generations in multiple environmental conditions, we found that the collective ecosystem dynamics slows down as the tipping point is approached. Analysis of the coupled dynamics of interacting populations may therefore be necessary to provide advance warning of collapse in complex communities.