Source author record

Yukun Li

Yukun Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Artificial Intelligence Computation and Language cs.CY Machine Learning Numerical Analysis

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AcademiClaw: When Students Set Challenges for AI Agents

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows -- homework, research projects, competitions, and personal projects -- that they found current AI agents unable to solve effectively. Curated from 230 student-submitted candidates through rigorous expert review, the final task set spans 25+ professional domains, ranging from olympiad-level mathematics and linguistics problems to GPU-intensive reinforcement learning and full-stack system debugging, with 16 tasks requiring CUDA GPU execution. Each task executes in an isolated Docker sandbox and is scored on task completion by multi-dimensional rubrics combining six complementary techniques, with an independent five-category safety audit providing additional behavioral analysis. Experiments on six frontier models show that even the best achieves only a 55\% pass rate. Further analysis uncovers sharp capability boundaries across task domains, divergent behavioral strategies among models, and a disconnect between token consumption and output quality, providing fine-grained diagnostic signals beyond what aggregate metrics reveal. We hope that AcademiClaw and its open-sourced data and code can serve as a useful resource for the OpenClaw community, driving progress toward agents that are more capable and versatile across the full breadth of real-world academic demands. All data and code are available at https://github.com/GAIR-NLP/AcademiClaw.

preprint2026arXiv

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic $N$-gram embedding for O(1) lookup. By formulating the Sparsity Allocation problem, we uncover a U-shaped scaling law that optimizes the trade-off between neural computation (MoE) and static memory (Engram). Guided by this law, we scale Engram to 27B parameters, achieving superior performance over a strictly iso-parameter and iso-FLOPs MoE baseline. Most notably, while the memory module is expected to aid knowledge retrieval (e.g., MMLU +3.4; CMMLU +4.0), we observe even larger gains in general reasoning (e.g., BBH +5.0; ARC-Challenge +3.7) and code/math domains~(HumanEval +3.0; MATH +2.4). Mechanistic analyses reveal that Engram relieves the backbone's early layers from static reconstruction, effectively deepening the network for complex reasoning. Furthermore, by delegating local dependencies to lookups, it frees up attention capacity for global context, substantially boosting long-context retrieval (e.g., Multi-Query NIAH: 84.2 to 97.0). Finally, Engram establishes infrastructure-aware efficiency: its deterministic addressing enables runtime prefetching from host memory, incurring negligible overhead. We envision conditional memory as an indispensable modeling primitive for next-generation sparse models.

preprint2020arXiv

Analysis of adaptive two-grid finite element algorithms for linear and nonlinear problems

This paper proposes some efficient and accurate adaptive two-grid (ATG) finite element algorithms for linear and nonlinear partial differential equations (PDEs). The main idea of these algorithms is to utilize the solutions on the $k$-th level adaptive meshes to find the solutions on the $(k+1)$-th level adaptive meshes which are constructed by performing adaptive element bisections on the $k$-th level adaptive meshes. These algorithms transform non-symmetric positive definite (non-SPD) PDEs (resp., nonlinear PDEs) into symmetric positive definite (SPD) PDEs (resp., linear PDEs). The proposed algorithms are both accurate and efficient due to the following advantages: they do not need to solve the non-symmetric or nonlinear systems; the degrees of freedom (d.o.f.) are very small; they are easily implemented; the interpolation errors are very small. Next, this paper constructs residue-type {\em a posteriori} error estimators, which are shown to be reliable and efficient. The key ingredient in proving the efficiency is to establish an upper bound of the oscillation terms, which may not be higher-order terms (h.o.t.) due to the low regularity of the numerical solution. Furthermore, the convergence of the algorithms is proved when bisection is used for the mesh refinements. Finally, numerical experiments are provided to verify the accuracy and efficiency of the ATG finite element algorithms, compared to regular adaptive finite element algorithms and two-grid finite element algorithms [27].

preprint2020arXiv

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks. To address this issue, we propose an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method. To make generation closer to human writing patterns, this framework introduces a span-by-span generation flow that trains the model to predict semantically-complete spans consecutively rather than predicting word by word. Unlike existing pre-training methods, ERNIE-GEN incorporates multi-granularity target sampling to construct pre-training data, which enhances the correlation between encoder and decoder. Experimental results demonstrate that ERNIE-GEN achieves state-of-the-art results with a much smaller amount of pre-training data and parameters on a range of language generation tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA).

preprint2015arXiv

Analysis of mixed interior penalty discontinuous Galerkin methods for the Cahn-Hilliard equation and the Hele-Shaw flow

This paper proposes and analyzes two fully discrete mixed interior penalty discontinuous Galerkin (DG) methods for the fourth order nonlinear Cahn-Hilliard equation. Both methods use the backward Euler method for time discretization and interior penalty discontinuous Galerkin methods for spatial discretization. They differ from each other on how the nonlinear term is treated, one of them is based on fully implicit time-stepping and the other uses the energy-splitting time-stepping. The primary goal of the paper is to prove the convergence of the numerical interfaces of the DG methods to the interface of the Hele-Shaw flow. This is achieved by establishing error estimates that depend on $ε^{-1}$ only in some low polynomial orders, instead of exponential orders. Similar to [14], the crux is to prove a discrete spectrum estimate in the discontinuous Galerkin finite element space. However, the validity of such a result is not obvious because the DG space is not a subspace of the (energy) space $H^1$ and it is larger than the finite element space. This difficult is overcome by a delicate perturbation argument which relies on the discrete spectrum estimate in the finite element space proved in \cite{Feng_Prohl04}. Numerical experiment results are also presented to gauge the theoretical results and the performance of the proposed fully discrete mixed DG methods.

preprint2015arXiv

Finite Element Methods for the Stochastic Allen-Cahn Equation with Gradient-type Multiplicative Noises

This paper studies finite element approximations of the stochastic Allen-Cahn equation with gradient-type multiplicative noises that are white in time and correlated in space. The sharp interface limit as the parameter $ε\rightarrow 0$ of the stochastic equation formally approximates a stochastic mean curvature flow which is described by a stochastically perturbed geometric law of the deterministic mean curvature flow. Both the stochastic Allen-Cahn equation and the stochastic mean curvature flow arise from materials science, fluid mechanics and cell biology applications. Two fully discrete finite element methods which are based on different time-stepping strategies for the nonlinear term are proposed. Strong convergence with sharp rates for both fully discrete finite element methods is proved. This is done with a crucial help of the Hölder continuity in time with respect to the spatial $L^2$-norm and $H^1$-seminorm for the strong solution of the stochastic Allen-Cahn equation, which are key technical lemmas proved in paper. It also relies on the fact that high moments of the strong solution are bounded in various spatial and temporal norms. Numerical experiments are provided to gauge the performance of the proposed fully discrete finite element methods and to study the interplay of the geometric evolution and gradient-type noises.

preprint2014arXiv

Analysis of interior penalty discontinuous Galerkin methods for the Allen-Cahn equation and the mean curvature flow

This paper develops and analyzes two fully discrete interior penalty discontinuous Galerkin (IP-DG) methods for the Allen-Cahn equation, which is a nonlinear singular perturbation of the heat equation and originally arises from phase transition of binary alloys in materials science, and its sharp interface limit (the mean curvature flow) as the perturbation parameter tends to zero. Both fully implicit and energy-splitting time-stepping schemes are proposed. The primary goal of the paper is to derive sharp error bounds which depend on the reciprocal of the perturbation parameter $ε$ (also called "interaction length") only in some lower polynomial order, instead of exponential order, for the proposed IP-DG methods. The derivation is based on a refinement of the nonstandard error analysis technique first introduced in [12]. The centerpiece of this new technique is to establish a spectrum estimate result in totally discontinuous DG finite element spaces with a help of a similar spectrum estimate result in the conforming finite element spaces which was established in [12]. As a nontrivial application of the sharp error estimates, they are used to establish convergence and the rates of convergence of the zero level sets of the fully discrete IP-DG solutions to the classical and generalized mean curvature flow. Numerical experiment results are also presented to gauge the theoretical results and the performance of the proposed fully discrete IP-DG methods.

preprint2014arXiv

Multiphysics Finite Element Methods for a Poroelasticity Model

This paper concerns with finite element approximations of a quasi-static poroelasticity model in displacement-pressure formulation which describes the dynamics of poro-elastic materials under an applied mechanical force on the boundary. To better describe the multiphysics process of deformation and diffusion for poro-elastic materials, we first present a reformulation of the original model by introducing two pseudo-pressures, one of them is shown to satisfy a diffusion equation, we then propose a time-stepping algorithm which decouples (or couples) the reformulated PDE problem at each time step into two sub-problems, one of which is a generalized Stokes problem for the displacement vector field (of the solid network of the poro-elastic material) along with one pseudo-pressure field and the other is a diffusion problem for the other pseudo-pressure field (of the solvent of the material). In the paper, the Taylor-Hood mixed finite element method combined with the $P_1$-conforming finite element method is used as an example to demonstrate the viability of the proposed multiphysics approach. It is proved that the solutions of the fully discrete finite element methods fulfill a discrete energy law which mimics the differential energy law satisfied by the PDE solution and converges optimally in the energy norm. Moreover, it is showed that the proposed formulation also has a built-in mechanism to overcome so-called "locking phenomenon" associated with the numerical approximations of the poroelasticity model. Numerical experiments are presented to show the performance of the proposed approach and methods and to demonstrate the absence of "locking phenomenon" in our numerical experiments.

preprint2013arXiv

Finite element approximations of the stochastic mean curvature flow of planar curves of graphs

This paper develops and analyzes a semi-discrete and a fully discrete finite element method for a one-dimensional quasilinear parabolic stochastic partial differential equation (SPDE) which describes the stochastic mean curvature flow for planar curves of graphs. To circumvent the difficulty caused by the low spatial regularity of the SPDE solution, a regularization procedure is first proposed to approximate the SPDE, and an error estimate for the regularized problem is derived. A semi-discrete finite element method, and a space-time fully discrete method are then proposed to approximate the solution of the regularized SPDE problem. Strong convergence with rates are established for both, semi- and fully discrete methods. Computational experiments are provided to study the interplay of the geometric evolution and gradient type-noises.

Yukun Li

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

AcademiClaw: When Students Set Challenges for AI Agents

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Analysis of adaptive two-grid finite element algorithms for linear and nonlinear problems

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Analysis of mixed interior penalty discontinuous Galerkin methods for the Cahn-Hilliard equation and the Hele-Shaw flow

Finite Element Methods for the Stochastic Allen-Cahn Equation with Gradient-type Multiplicative Noises

Analysis of interior penalty discontinuous Galerkin methods for the Allen-Cahn equation and the mean curvature flow

Multiphysics Finite Element Methods for a Poroelasticity Model

Finite element approximations of the stochastic mean curvature flow of planar curves of graphs