Source author record

Yihan Li

Yihan Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language cond-mat.mes-hall Information Theory math.IT

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning

Programmatic video generation through code offers geometric precision and temporal coherence beyond pixel-level diffusion models, yet rigorously evaluating whether language models can produce spatially correct animated outputs remains an open problem. We introduce PRISM, a large-scale benchmark of 10,372 human-calibrated instruction-code pairs (20 times larger than prior programmatic video generation benchmarks), grounded in real-world knowledge visualization scenarios across English and Chinese and spanning 437 subject categories. We further propose a funnel-style evaluation framework with four complementary metrics: Code-Level Reliability for executability, Spatial Reasoning for layout correctness over full animation sequences, and Prompt-Aware Dynamic Visual Complexity (PADVC) and Temporal Density (TD) for diagnosing dynamic expression and temporal activity. Systematic evaluation of seven mainstream LLMs reveals a striking Execution-Spatial Gap: the average drop from execution success rate to spatial pass rate is approximately 41%, showing that runnable code does not necessarily yield spatially coherent visual output. These findings show that programmatic video generation evaluation should go beyond executability. PRISM provides a principled benchmark for advancing spatially coherent code generation.

preprint2026arXiv

PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor

To develop a reliable AI for psychological assessment, we introduce \texttt{PsychEval}, a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenges: \textbf{1) Can we train a highly realistic AI counselor?} Realistic counseling is a longitudinal task requiring sustained memory and dynamic goal tracking. We propose a multi-session benchmark (spanning 6-10 sessions across three distinct stages) that demands critical capabilities such as memory continuity, adaptive reasoning, and longitudinal planning. The dataset is annotated with extensive professional skills, comprising over 677 meta-skills and 4577 atomic skills. \textbf{2) How to train a multi-therapy AI counselor?} While existing models often focus on a single therapy, complex cases frequently require flexible strategies among various therapies. We construct a diverse dataset covering five therapeutic modalities (Psychodynamic, Behaviorism, CBT, Humanistic Existentialist, and Postmodernist) alongside an integrative therapy with a unified three-stage clinical framework across six core psychological topics. \textbf{3) How to systematically evaluate an AI counselor?} We establish a holistic evaluation framework with 18 therapy-specific and therapy-shared metrics across Client-Level and Counselor-Level dimensions. To support this, we also construct over 2,000 diverse client profiles. Extensive experimental analysis fully validates the superior quality and clinical fidelity of our dataset. Crucially, \texttt{PsychEval} transcends static benchmarking to serve as a high-fidelity reinforcement learning environment that enables the self-evolutionary training of clinically responsible and adaptive AI counselors.

preprint2022arXiv

Learning Invariable Semantical Representation from Language for Extensible Policy Generalization

Recently, incorporating natural language instructions into reinforcement learning (RL) to learn semantically meaningful representations and foster generalization has caught many concerns. However, the semantical information in language instructions is usually entangled with task-specific state information, which hampers the learning of semantically invariant and reusable representations. In this paper, we propose a method to learn such representations called element randomization, which extracts task-relevant but environment-agnostic semantics from instructions using a set of environments with randomized elements, e.g., topological structures or textures, yet the same language instruction. We theoretically prove the feasibility of learning semantically invariant representations through randomization. In practice, we accordingly develop a hierarchy of policies, where a high-level policy is designed to modulate the behavior of a goal-conditioned low-level policy by proposing subgoals as semantically invariant representations. Experiments on challenging long-horizon tasks show that (1) our low-level policy reliably generalizes to tasks against environment changes; (2) our hierarchical policy exhibits extensible generalization in unseen new tasks that can be decomposed into several solvable sub-tasks; and (3) by storing and replaying language trajectories as succinct policy representations, the agent can complete tasks in a one-shot fashion, i.e., once one successful trajectory has been attained.

preprint2011arXiv

Downlink Power Allocation for Stored Variable-Bit-Rate Videos

In this paper, we study the problem of power allocation for streaming multiple variable-bit-rate (VBR) videos in the downlink of a cellular network. We consider a deterministic model for VBR video traffic and finite playout buffer at the mobile users. The objective is to derive the optimal downlink power allocation for the VBR video sessions, such that the video data can be delivered in a timely fashion without causing playout buffer overflow and underflow. The formulated problem is a nonlinear nonconvex optimization problem. We analyze the convexity conditions for the formulated problem and propose a two-step greedy approach to solve the problem. We also develop a distributed algorithm based on the dual decomposition technique. The performance of the proposed algorithms are validated with simulations using VBR video traces under realistic scenarios.

preprint2011arXiv

Switching of +/-360deg domain wall states in a nanoring by an azimuthal Oersted field

We demonstrate magnetic switching between two $360^\circ$ domain wall vortex states in cobalt nanorings, which are candidate magnetic states for robust and low power MRAM devices. These $360^\circ$ domain wall (DW) or "twisted onion" states can have clockwise or counterclockwise circulation, the two states for data storage. Reliable switching between the states is necessary for any realistic device. We accomplish this switching by applying a circular Oersted field created by passing current through a metal atomic force microscope tip placed at the center of the ring. After initializing in an onion state, we rotate the DWs to one side of the ring by passing a current through the center, and can switch between the two twisted states by reversing the current, causing the DWs to split and meet again on the opposite side of the ring. A larger current will annihilate the DWs and create a perfect vortex state in the rings.

Yihan Li

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning

PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor

Learning Invariable Semantical Representation from Language for Extensible Policy Generalization

Downlink Power Allocation for Stored Variable-Bit-Rate Videos

Switching of +/-360deg domain wall states in a nanoring by an azimuthal Oersted field