Researcher profile

Ray Li

Ray Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only handoff reduces the measured step by 18.3x on a 4B dense model and 2.85x on a 30B MoE, while concurrent multi-policy GRPO shortens wall time by 1.77x and 1.45x without raising peak memory. Scale Out separates durable policy addressability from CPU/GPU working sets: a tensor-parallel deployment supports 10^6-scale addressable catalogs (measured single-engine sweeps through 100K) and thousand-adapter active waves at cluster scale, with cold loading treated as scheduled service work and packed MoE LoRA tensors improving live engine loading by 8.5-8.7x. MinT thus manages million-scale LoRA policy catalogs while training and serving selected adapter revisions over shared 1T-class base models.

preprint2022arXiv

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level between any pair of seen speakers. We do this by activating distinct parts of the network for different tasks. We train our model using a novel approach to two-stage training. In Stage I, the model learns speaker-independent word-level prosody representations from speech which it uses for many-to-many fine-grained prosody transfer. In Stage II, we learn to predict these prosody representations using the contextual information available in text, thereby, enabling multi-speaker TTS with contextually appropriate prosody. We compare CC2 to two strong baselines, one in TTS with contextually appropriate prosody, and one in fine-grained prosody transfer. CC2 reduces the gap in naturalness between our baseline and copy-synthesised speech by $22.79\%$. In fine-grained prosody transfer evaluations, it obtains a relative improvement of $33.15\%$ in target speaker similarity.

preprint2022arXiv

Efficient Near-Optimal Codes for General Repeat Channels

Given a probability distribution $\mathcal{D}$ over the non-negative integers, a $\mathcal{D}$-repeat channel acts on an input symbol by repeating it a number of times distributed as $\mathcal{D}$. For example, the binary deletion channel ($\mathcal{D}=Bernoulli$) and the Poisson repeat channel ($\mathcal{D}=Poisson$) are special cases. We say a $\mathcal{D}$-repeat channel is square-integrable if $\mathcal{D}$ has finite first and second moments. In this paper, we construct explicit codes for all square-integrable $\mathcal{D}$-repeat channels with rate arbitrarily close to the capacity, that are encodable and decodable in linear and quasi-linear time, respectively. We also consider possible extensions to the repeat channel model, and illustrate how our construction can be extended to an even broader class of channels capturing insertions, deletions, and substitutions. Our work offers an alternative, simplified, and more general construction to the recent work of Rubinstein (arXiv:2111.00261), who attains similar results to ours in the cases of the deletion channel and the Poisson repeat channel. It also slightly improves the runtime and decoding failure probability of the polar codes constructions of Tal et al. (ISIT 2019) and of Pfister and Tal (arXiv:2102.02155) for the deletion channel and certain insertion/deletion/substitution channels. Our techniques follow closely the approaches of Guruswami and Li (IEEEToIT 2019) and Con and Shpilka (IEEEToIT 2020); what sets apart our work is that we show that a capacity-achieving code can be assumed to have an "approximate balance" in the frequency of zeros and ones of all sufficiently long substrings of all codewords. This allows us to attain near-capacity-achieving codes in a general setting. We consider this "approximate balance" result to be of independent interest, as it can be cast in much greater generality than repeat channels.

preprint2020arXiv

Coded trace reconstruction in a constant number of traces

The coded trace reconstruction problem asks to construct a code $C\subset \{0,1\}^n$ such that any $x\in C$ is recoverable from independent outputs ("traces") of $x$ from a binary deletion channel (BDC). We present binary codes of rate $1-\varepsilon$ that are efficiently recoverable from ${\exp(O_q(\log^{1/3}(\frac{1}{\varepsilon})))}$ (a constant independent of $n$) traces of a $\operatorname{BDC}_q$ for any constant deletion probability $q\in(0,1)$. We also show that, for rate $1-\varepsilon$ binary codes, $\tilde Ω(\log^{5/2}(1/\varepsilon))$ traces are required. The results follow from a pair of black-box reductions that show that average-case trace reconstruction is essentially equivalent to coded trace reconstruction. We also show that there exist codes of rate $1-\varepsilon$ over an $O_{\varepsilon}(1)$-sized alphabet that are recoverable from $O(\log(1/\varepsilon))$ traces, and that this is tight.

preprint2020arXiv

Effective bounds on multiplicatively dependent orbits of integer polynomials modulo S-integers

We obtain effective bounds on the heights of algebraic integers whose orbits contain multiplicatively dependent values modulo S-integers. Our method is based on a new upper bound on the so-called S-height of polynomial values over the ring of integers of $\mathbb{K}$. Our results provide an effective variant of a recent result of A.Bérczes, A.Ostafe, I.E.Shparlinski and J.H.Silverman (arXiv:1811.04971) on multiplicative dependence modulo a finitely generated subgroup by eliminating the use of non-effective results by K.F.Roth and G.Faltings.

preprint2020arXiv

Hat Guessing Numbers of Degenerate Graphs

Recently, Farnik asked whether the hat guessing number $\text{HG}(G)$ of a graph $G$ could be bounded as a function of its degeneracy $d$, and Bosek, Dudek, Farnik, Grytczuk and Mazur showed that $\text{HG}(G)\ge 2^d$ is possible. We show that for all $d\ge 1$ there exists a $d$-degenerate graph $G$ for which $\text{HG}(G) \ge 2^{2^{d-1}}$. We also give a new general method for obtaining upper bounds on $\text{HG}(G)$. The question of whether $\text{HG}(G)$ is bounded as a function of $d$ remains open.

preprint2020arXiv

Lifted multiplicity codes and the disjoint repair group property

Lifted Reed Solomon Codes (Guo, Kopparty, Sudan 2013) were introduced in the context of locally correctable and testable codes. They are multivariate polynomials whose restriction to any line is a codeword of a Reed-Solomon code. We consider a generalization of their construction, which we call lifted multiplicity codes. These are multivariate polynomial codes whose restriction to any line is a codeword of a multiplicity code (Kopparty, Saraf, Yekhanin 2014). We show that lifted multiplicity codes have a better trade-off between redundancy and a notion of locality called the $t$-disjoint-repair-group property than previously known constructions. More precisely, we show that lifted multiplicity codes with length $N$ and redundancy $O(t^{0.585} \sqrt{N})$ have the property that any symbol of a codeword can be reconstructed in $t$ different ways, each using a disjoint subset of the other coordinates. This gives the best known trade-off for this problem for any super-constant $t < \sqrt{N}$. We also give an alternative analysis of lifted Reed Solomon codes using dual codes, which may be of independent interest.

preprint2020arXiv

Lower bounds for Max-Cut in $H$-free graphs via semidefinite programming

For a graph $G$, let $f(G)$ denote the size of the maximum cut in $G$. The problem of estimating $f(G)$ as a function of the number of vertices and edges of $G$ has a long history and was extensively studied in the last fifty years. In this paper we propose an approach, based on semidefinite programming (SDP), to prove lower bounds on $f(G)$. We use this approach to find large cuts in graphs with few triangles and in $K_r$-free graphs.

preprint2020arXiv

Max-Cut in Degenerate $H$-Free Graphs

We obtain several lower bounds on the $\textsf{Max-Cut}$ of $d$-degenerate $H$-free graphs. Let $f(m,d,H)$ denote the smallest $\textsf{Max-Cut}$ of an $H$-free $d$-degenerate graph on $m$ edges. We show that $f(m,d,K_r)\ge \left(\frac{1}{2} + d^{-1+Ω(r^{-1})}\right)m$, generalizing a recent work of Carlson, Kolla, and Trevisan. We also give bounds on $f(m,d,H)$ when $H$ is a cycle, odd wheel, or a complete bipartite graph with at most 4 vertices on one side. We also show stronger bounds on $f(m,d,K_r)$ assuming a conjecture of Alon, Bollabas, Krivelevich, and Sudakov (2003). We conjecture that $f(m,d,K_r)= \left( \frac{1}{2} + Θ_r(d^{-1/2}) \right)m$ for every $r\ge 3$, and show that this conjecture implies the ABKS conjecture.

preprint2019arXiv

On Ramsey numbers of hedgehogs

The hedgehog $H_t$ is a 3-uniform hypergraph on vertices $1,\dots,t+\binom{t}{2}$ such that, for any pair $(i,j)$ with $1\le i<j\le t$, there exists a unique vertex $k>t$ such that $\{i,j,k\}$ is an edge. Conlon, Fox, and Rödl proved that the two-color Ramsey number of the hedgehog grows polynomially in the number of its vertices, while the four-color Ramsey number grows exponentially in the number of its vertices. They asked whether the two-color Ramsey number of the hedgehog $H_t$ is nearly linear in the number of its vertices. We answer this question affirmatively, proving that $r(H_t) = O(t^2\ln t)$.