Source author record

Sungyeon Yang

Sungyeon Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.CO Computation and Language gr-qc hep-ph hep-th

Catalog footprint

What is connected

2works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ZAYA1-8B Technical Report

We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-weight reasoning models. ZAYA1-8B was trained from scratch for reasoning, with reasoning data included from pretraining onward using an answer-preserving trimming scheme. Post-training uses a four-stage RL cascade: reasoning warmup on math and puzzles; a 400-task RLVE-Gym curriculum; math and code RL with test-time compute traces and synthetic code environments built from competitive-programming references; and behavioral RL for chat and instruction following. We also introduce Markovian RSA, a test-time compute method that recursively aggregates parallel reasoning traces while carrying forward only bounded-length reasoning tails between rounds. In TTC evaluation, Markovian RSA raises ZAYA1-8B to 91.9\% on AIME'25 and 89.6\% on HMMT'25 while carrying forward only a 4K-token tail, narrowing the gap to much larger reasoning models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High.

preprint2022arXiv

de Sitter Microstates from $T\bar T+Λ_2$ and the Hawking-Page Transition

We obtain microstates accounting for the Gibbons-Hawking entropy in $dS_3$, along with a subleading logarithmic correction, from the solvable $T\bar T+Λ_2$ deformation of a seed CFT with sparse light spectrum. The microstates arise as the dressed CFT states near dimension $Δ=c/6$, associated with the Hawking-Page transition; they dominate the real spectrum of the deformed theory. We exhibit an analogue of the Hawking-Page transition in de Sitter. Appropriate generalizations of the $T\bar T+Λ_2$ deformation are required to treat model-dependent local bulk physics (subleading at large central charge) and higher dimensions. These results add considerably to the already strong motivation for the continued pursuit of such generalizations along with a more complete characterization of $T\bar T$ type theories, building from existing results in these directions.