Source author record

Ryan Lee

Ryan Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning astro-ph.IM cond-mat.mtrl-sci cond-mat.str-el

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths

Designing a unified neural network to efficiently and inherently process sequential data with arbitrary lengths is a central and challenging problem in sequence modeling. The design choices in Transformer, including quadratic complexity and weak length extrapolation, have limited their ability to scale to long sequences. In this work, we propose Gecko, a neural architecture that inherits the design of Mega and Megalodon (exponential moving average with gated attention), and further introduces multiple technical components to improve its capability to capture long range dependencies, including timestep decay normalization, sliding chunk attention mechanism, and adaptive working memory. In a controlled pretraining comparison with Llama2 and Megalodon in the scale of 7 billion parameters and 2 trillion training tokens, Gecko achieves better efficiency and long-context scalability. Gecko reaches a training loss of 1.68, significantly outperforming Llama2-7B (1.75) and Megalodon-7B (1.70), and landing close to Llama2-13B (1.67). Notably, without relying on any context-extension techniques, Gecko exhibits inherent long-context processing and retrieval capabilities, stably handling sequences of up to 4 million tokens and retrieving information from contexts up to $4\times$ longer than its attention window. Code: https://github.com/XuezheMax/gecko-llm

preprint2026arXiv

Sparse Layers are Critical to Scaling Looped Language Models

Looped language models repeat a set of transformer layers through depth, reducing memory costs and providing natural early-exit points at loop boundaries. However, looped models do not scale as favorably as standard transformers with unique layers. We compare standard and Mixture-of-Experts (MoE) transformers, with and without looping, and find two main results. First, we find Looped-MoE models scale better than the standard baseline while dense looped models do not. We trace this to routing divergence between loops: in Looped-MoE models, different experts are activated on each pass through the same shared layers, recovering expressivity without additional parameters. Our second finding is that looped models have better compute-quality trade-offs with early exits than standard models. Because each loop ends with the same layers that produce the final output, loop boundaries are superior exit points, as confirmed by earlier output convergence at these points. In sum, we provide a clear direction for scaling looped models: a Looped-MoE model with early exits can not only beat standard transformers at scale, but also enable significant memory and inference savings with minimal degradation in quality.

preprint2020arXiv

First SETI Observations with China's Five-hundred-meter Aperture Spherical radio Telescope (FAST)

The Search for Extraterrestrial Intelligence (SETI) attempts to address the possibility of the presence of technological civilizations beyond the Earth. Benefiting from high sensitivity, large sky coverage, an innovative feed cabin for China's Five-hundred-meter Aperture Spherical radio Telescope (FAST), we performed the SETI first observations with FAST's newly commisioned 19-beam receiver; we report preliminary results in this paper. Using the data stream produced by the SERENDIP VI realtime multibeam SETI spectrometer installed at FAST, as well as its off-line data processing pipelines, we identify and remove four kinds of radio frequency interference(RFI): zone, broadband, multi-beam, and drifting, utilizing the Nebula SETI software pipeline combined with machine learning algorithms. After RFI mitigation, the Nebula pipeline identifies and ranks interesting narrow band candidate ET signals, scoring candidates by the number of times candidate signals have been seen at roughly the same sky position and same frequency, signal strength, proximity to a nearby star or object of interest, along with several other scoring criteria. We show four example candidates groups that demonstrate these RFI mitigation and candidate selection. This preliminary testing on FAST data helps to validate our SETI instrumentation techniques as well as our data processing pipeline.

preprint2019arXiv

Visualizing Exotic Orbital Texture in the Single-Layer Mott Insulator 1T-TaSe2

Mott insulating behavior is induced by strong electron correlation and can lead to exotic states of matter such as unconventional superconductivity and quantum spin liquids. Recent advances in van der Waals material synthesis enable the exploration of novel Mott systems in the two-dimensional limit. Here we report characterization of the local electronic properties of single- and few-layer 1T-TaSe2 via spatial- and momentum-resolved spectroscopy involving scanning tunneling microscopy and angle-resolved photoemission. Our combined experimental and theoretical study indicates that electron correlation induces a robust Mott insulator state in single-layer 1T-TaSe2 that is accompanied by novel orbital texture. Inclusion of interlayer coupling weakens the insulating phase in 1T-TaSe2, as seen by strong reduction of its energy gap and quenching of its correlation-driven orbital texture in bilayer and trilayer 1T-TaSe2. Our results establish single-layer 1T-TaSe2 as a useful new platform for investigating strong correlation physics in two dimensions.