Source author record

Yufeng Du

Yufeng Du appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence astro-ph.CO Computation and Language hep-ph

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Physics beyond the Standard Model with the DSA-2000

The upcoming Deep Synoptic Array 2000 (DSA-2000) will map the radio sky at $0.7-2$ GHz ($2.9 - 8.3 \, μ$eV) with unprecedented sensitivity. This will enable searches for dark matter and other physics beyond the Standard Model, of which we study four cases: axions, dark photons, dark matter subhalos and neutrino masses. We forecast DSA-2000's potential to detect axions through two mechanisms in neutron star magnetospheres: photon conversion of axion dark matter and radio emission from axion clouds, developing the first analytical treatment of the latter. We also forecast DSA-2000's sensitivity to discover kinetically mixed dark photons from black hole superradiance, constrain dark matter substructure and fifth forces through pulsar timing, and improve cosmological neutrino mass inference through fast radio burst dispersion measurements. Our analysis indicates that in its planned five year run the DSA-2000 could reach sensitivity to QCD axion parameters, improve current limits on compact dark matter by an order of magnitude, and enhance cosmological weak lensing neutrino mass constraints by a factor of three.

preprint2026arXiv

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

We identify intrinsic limitations of Rotary Positional Embeddings (RoPE) in Transformer-based long-context language models. Our theoretical analysis abstracts away from the specific content of the context and depends only on its length. We prove that as context length increases, RoPE-based attention becomes unpredictable and loses two properties that are central to its effectiveness. First, it loses its locality bias: RoPE is no more likely to favor nearer positions than substantially farther ones. Second, it loses consistency in token relevance: a key vector that receives a higher attention score than an alternative at one position may receive a lower score at another. In both cases, the probability of failure approaches 0.5, no better than random guessing. We further prove that the attention score can remain unchanged when a key token is moved to a different position, or even replaced by a different token, indicating a failure to distinguish positions or tokens. Adjusting the RoPE base trades off distinguishing positions against distinguishing tokens but cannot preserve both at the same time. Increasing the RoPE base hyperparameter, a common practice in today's long-context models, helps distinguish different tokens, but inevitably sacrifices the ability to distinguish positions. Our empirical analysis shows that multi-head, multi-layer architectures are insufficient to overcome these limitations. Our findings suggest that fundamentally new mechanisms for encoding position and token order may be needed in future Transformer long-context language models.

preprint2022arXiv

Evaluating Modules in Graph Contrastive Learning

The recent emergence of contrastive learning approaches facilitates the application on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature. These methods contrast semantically similar and dissimilar sample pairs to encode the semantics into node or graph embeddings. However, most existing works only performed \textbf{model-level} evaluation, and did not explore the combination space of modules for more comprehensive and systematic studies. For effective \textbf{module-level} evaluation, we propose a framework that decomposes GCL models into four modules: (1) a \textbf{sampler} to generate anchor, positive and negative data samples (nodes or graphs); (2) an \textbf{encoder} and a \textbf{readout} function to get sample embeddings; (3) a \textbf{discriminator} to score each sample pair (anchor-positive and anchor-negative); and (4) an \textbf{estimator} to define the loss function. Based on this framework, we conduct controlled experiments over a wide range of architectural designs and hyperparameter settings on node and graph classification tasks. Specifically, we manage to quantify the impact of a single module, investigate the interaction between modules, and compare the overall performance with current model architectures. Our key findings include a set of module-level guidelines for GCL, e.g., simple samplers from LINE and DeepWalk are strong and robust; an MLP encoder associated with Sum readout could achieve competitive performance on graph classification. Finally, we release our implementations and results as OpenGCL, a modularized toolkit that allows convenient reproduction, standard model and module evaluation, and easy extension. OpenGCL is available at \url{https://github.com/thunlp/OpenGCL}.