Source author record

Ray Li

Ray Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.CO Information Theory math.IT Data Structures and Algorithms Computational Complexity Discrete Mathematics math.NT Artificial Intelligence Distributed, Parallel, and Cluster Computing eess.AS Machine Learning math.DS Sound

Catalog footprint

What is connected

14works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only handoff reduces the measured step by 18.3x on a 4B dense model and 2.85x on a 30B MoE, while concurrent multi-policy GRPO shortens wall time by 1.77x and 1.45x without raising peak memory. Scale Out separates durable policy addressability from CPU/GPU working sets: a tensor-parallel deployment supports 10^6-scale addressable catalogs (measured single-engine sweeps through 100K) and thousand-adapter active waves at cluster scale, with cold loading treated as scheduled service work and packed MoE LoRA tensors improving live engine loading by 8.5-8.7x. MinT thus manages million-scale LoRA policy catalogs while training and serving selected adapter revisions over shared 1T-class base models.

preprint2022arXiv

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level between any pair of seen speakers. We do this by activating distinct parts of the network for different tasks. We train our model using a novel approach to two-stage training. In Stage I, the model learns speaker-independent word-level prosody representations from speech which it uses for many-to-many fine-grained prosody transfer. In Stage II, we learn to predict these prosody representations using the contextual information available in text, thereby, enabling multi-speaker TTS with contextually appropriate prosody. We compare CC2 to two strong baselines, one in TTS with contextually appropriate prosody, and one in fine-grained prosody transfer. CC2 reduces the gap in naturalness between our baseline and copy-synthesised speech by $22.79\%$. In fine-grained prosody transfer evaluations, it obtains a relative improvement of $33.15\%$ in target speaker similarity.

preprint2022arXiv

Efficient Near-Optimal Codes for General Repeat Channels

Given a probability distribution $\mathcal{D}$ over the non-negative integers, a $\mathcal{D}$-repeat channel acts on an input symbol by repeating it a number of times distributed as $\mathcal{D}$. For example, the binary deletion channel ($\mathcal{D}=Bernoulli$) and the Poisson repeat channel ($\mathcal{D}=Poisson$) are special cases. We say a $\mathcal{D}$-repeat channel is square-integrable if $\mathcal{D}$ has finite first and second moments. In this paper, we construct explicit codes for all square-integrable $\mathcal{D}$-repeat channels with rate arbitrarily close to the capacity, that are encodable and decodable in linear and quasi-linear time, respectively. We also consider possible extensions to the repeat channel model, and illustrate how our construction can be extended to an even broader class of channels capturing insertions, deletions, and substitutions. Our work offers an alternative, simplified, and more general construction to the recent work of Rubinstein (arXiv:2111.00261), who attains similar results to ours in the cases of the deletion channel and the Poisson repeat channel. It also slightly improves the runtime and decoding failure probability of the polar codes constructions of Tal et al. (ISIT 2019) and of Pfister and Tal (arXiv:2102.02155) for the deletion channel and certain insertion/deletion/substitution channels. Our techniques follow closely the approaches of Guruswami and Li (IEEEToIT 2019) and Con and Shpilka (IEEEToIT 2020); what sets apart our work is that we show that a capacity-achieving code can be assumed to have an "approximate balance" in the frequency of zeros and ones of all sufficiently long substrings of all codewords. This allows us to attain near-capacity-achieving codes in a general setting. We consider this "approximate balance" result to be of independent interest, as it can be cast in much greater generality than repeat channels.

preprint2020arXiv

Coded trace reconstruction in a constant number of traces

The coded trace reconstruction problem asks to construct a code $C\subset \{0,1\}^n$ such that any $x\in C$ is recoverable from independent outputs ("traces") of $x$ from a binary deletion channel (BDC). We present binary codes of rate $1-\varepsilon$ that are efficiently recoverable from ${\exp(O_q(\log^{1/3}(\frac{1}{\varepsilon})))}$ (a constant independent of $n$) traces of a $\operatorname{BDC}_q$ for any constant deletion probability $q\in(0,1)$. We also show that, for rate $1-\varepsilon$ binary codes, $\tilde Ω(\log^{5/2}(1/\varepsilon))$ traces are required. The results follow from a pair of black-box reductions that show that average-case trace reconstruction is essentially equivalent to coded trace reconstruction. We also show that there exist codes of rate $1-\varepsilon$ over an $O_{\varepsilon}(1)$-sized alphabet that are recoverable from $O(\log(1/\varepsilon))$ traces, and that this is tight.

preprint2020arXiv

Effective bounds on multiplicatively dependent orbits of integer polynomials modulo S-integers

We obtain effective bounds on the heights of algebraic integers whose orbits contain multiplicatively dependent values modulo S-integers. Our method is based on a new upper bound on the so-called S-height of polynomial values over the ring of integers of $\mathbb{K}$. Our results provide an effective variant of a recent result of A.Bérczes, A.Ostafe, I.E.Shparlinski and J.H.Silverman (arXiv:1811.04971) on multiplicative dependence modulo a finitely generated subgroup by eliminating the use of non-effective results by K.F.Roth and G.Faltings.

preprint2020arXiv

Hat Guessing Numbers of Degenerate Graphs

Recently, Farnik asked whether the hat guessing number $\text{HG}(G)$ of a graph $G$ could be bounded as a function of its degeneracy $d$, and Bosek, Dudek, Farnik, Grytczuk and Mazur showed that $\text{HG}(G)\ge 2^d$ is possible. We show that for all $d\ge 1$ there exists a $d$-degenerate graph $G$ for which $\text{HG}(G) \ge 2^{2^{d-1}}$. We also give a new general method for obtaining upper bounds on $\text{HG}(G)$. The question of whether $\text{HG}(G)$ is bounded as a function of $d$ remains open.

preprint2020arXiv

Lifted multiplicity codes and the disjoint repair group property

Lifted Reed Solomon Codes (Guo, Kopparty, Sudan 2013) were introduced in the context of locally correctable and testable codes. They are multivariate polynomials whose restriction to any line is a codeword of a Reed-Solomon code. We consider a generalization of their construction, which we call lifted multiplicity codes. These are multivariate polynomial codes whose restriction to any line is a codeword of a multiplicity code (Kopparty, Saraf, Yekhanin 2014). We show that lifted multiplicity codes have a better trade-off between redundancy and a notion of locality called the $t$-disjoint-repair-group property than previously known constructions. More precisely, we show that lifted multiplicity codes with length $N$ and redundancy $O(t^{0.585} \sqrt{N})$ have the property that any symbol of a codeword can be reconstructed in $t$ different ways, each using a disjoint subset of the other coordinates. This gives the best known trade-off for this problem for any super-constant $t < \sqrt{N}$. We also give an alternative analysis of lifted Reed Solomon codes using dual codes, which may be of independent interest.

preprint2020arXiv

Lower bounds for Max-Cut in $H$-free graphs via semidefinite programming

For a graph $G$, let $f(G)$ denote the size of the maximum cut in $G$. The problem of estimating $f(G)$ as a function of the number of vertices and edges of $G$ has a long history and was extensively studied in the last fifty years. In this paper we propose an approach, based on semidefinite programming (SDP), to prove lower bounds on $f(G)$. We use this approach to find large cuts in graphs with few triangles and in $K_r$-free graphs.

preprint2020arXiv

Max-Cut in Degenerate $H$-Free Graphs

We obtain several lower bounds on the $\textsf{Max-Cut}$ of $d$-degenerate $H$-free graphs. Let $f(m,d,H)$ denote the smallest $\textsf{Max-Cut}$ of an $H$-free $d$-degenerate graph on $m$ edges. We show that $f(m,d,K_r)\ge \left(\frac{1}{2} + d^{-1+Ω(r^{-1})}\right)m$, generalizing a recent work of Carlson, Kolla, and Trevisan. We also give bounds on $f(m,d,H)$ when $H$ is a cycle, odd wheel, or a complete bipartite graph with at most 4 vertices on one side. We also show stronger bounds on $f(m,d,K_r)$ assuming a conjecture of Alon, Bollabas, Krivelevich, and Sudakov (2003). We conjecture that $f(m,d,K_r)= \left( \frac{1}{2} + Θ_r(d^{-1/2}) \right)m$ for every $r\ge 3$, and show that this conjecture implies the ABKS conjecture.

preprint2019arXiv

On Ramsey numbers of hedgehogs

The hedgehog $H_t$ is a 3-uniform hypergraph on vertices $1,\dots,t+\binom{t}{2}$ such that, for any pair $(i,j)$ with $1\le i<j\le t$, there exists a unique vertex $k>t$ such that $\{i,j,k\}$ is an edge. Conlon, Fox, and Rödl proved that the two-color Ramsey number of the hedgehog grows polynomially in the number of its vertices, while the four-color Ramsey number grows exponentially in the number of its vertices. They asked whether the two-color Ramsey number of the hedgehog $H_t$ is nearly linear in the number of its vertices. We answer this question affirmatively, proving that $r(H_t) = O(t^2\ln t)$.

preprint2016arXiv

Central Limit Theorems for Gaps of Generalized Zeckendorf Decompositions

Zeckendorf proved that every integer can be written uniquely as a sum of non-adjacent Fibonacci numbers $\{1,2,3,5,\dots\}$. This has been extended to many other recurrence relations $\{G_n\}$ (with their own notion of a legal decomposition) and to proving that the distribution of the number of summands of an $M \in [G_n, G_{n+1})$ converges to a Gaussian as $n\to\infty$. We prove that for any non-negative integer $g$ the average number of gaps of size $g$ in many generalized Zeckendorf decompositions is $C_μn+d_μ+o(1)$ for constants $C_μ> 0$ and $d_μ$ depending on $g$ and the recurrence, the variance of the number of gaps of size $g$ is similarly $C_σn + d_σ+ o(1)$ with $C_σ> 0$, and the number of gaps of size $g$ of an $M\in[G_n,G_{n+1})$ converges to a Gaussian as $n\to\infty$. The proof is by analysis of an associated two-dimensional recurrence; we prove a general result on when such behavior converges to a Gaussian, and additionally re-derive other results in the literature.

preprint2016arXiv

Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes

This work constructs codes that are efficiently decodable from a constant fraction of \emph{worst-case} insertion and deletion errors in three parameter settings: (i) Binary codes with rate approaching 1; (ii) Codes with constant rate for error fraction approaching 1 over fixed alphabet size; and (iii) Constant rate codes over an alphabet of size $k$ for error fraction approaching $(k-1)/(k+1)$. When errors are constrained to deletions alone, efficiently decodable codes in each of these regimes were constructed recently. We complete the picture by constructing similar codes that are efficiently decodable in the insertion/deletion regime.

preprint2014arXiv

An Elementary Proof of the Cayley Formula Using Random Maps

Cayley's formula states that the number of labelled trees on $n$ vertices is $n^{n-2}$, and many of the current proofs involve complex structures or rigorous computation. We present a bijective proof of the formula by providing an elementary calculation of the probability that a cycle occurs in a random map from an $n$-element set to an $n+1$-element set.

preprint2013arXiv

A Simple Proof of the Cayley Formula using Random Graphs

We present a nice result on the probability of a cycle occurring in a randomly generated graph. We then provide some extensions and applications, including the proof of the famous Cayley formula, which states that the number of labeled trees on $n$ vertices is $n^{n-2}.$

Ray Li

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Efficient Near-Optimal Codes for General Repeat Channels

Coded trace reconstruction in a constant number of traces

Effective bounds on multiplicatively dependent orbits of integer polynomials modulo S-integers

Hat Guessing Numbers of Degenerate Graphs

Lifted multiplicity codes and the disjoint repair group property

Lower bounds for Max-Cut in $H$-free graphs via semidefinite programming

Max-Cut in Degenerate $H$-Free Graphs

On Ramsey numbers of hedgehogs

Central Limit Theorems for Gaps of Generalized Zeckendorf Decompositions

Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes

An Elementary Proof of the Cayley Formula Using Random Maps

A Simple Proof of the Cayley Formula using Random Graphs