Researcher profile

Yunfan Li

Yunfan Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents

Recent advances in large language models (LLMs) have enabled agents to autonomously execute complex, long-horizon tasks, yet planning remains a primary bottleneck for reliable task execution. Existing methods typically fall into two paradigms: step-wise planning, which is reactive but often short-sighted; and one-shot planning, which generates a complete plan upfront yet is brittle to execution errors. Crucially, both paradigms suffer from entangled contexts, where the agent must reason over a monolithic history spanning multiple sub-tasks. This entanglement increases cognitive load and lets local errors propagate across otherwise independent decisions, making recovery computationally expensive. To address this, we propose Task-Decoupled Planning (TDP), a training-free framework that replaces entangled reasoning with task decoupling. TDP decomposes tasks into a directed acyclic graph (DAG) of sub-goals via a Supervisor. Using a Planner and Executor with scoped contexts, TDP confines reasoning and replanning to the active sub-task. This isolation prevents error propagation and corrects deviations locally without disrupting the workflow. Results on TravelPlanner, ScienceWorld, and HotpotQA show that TDP outperforms strong baselines while reducing token consumption by up to 82%, demonstrating that sub-task decoupling improves both robustness and efficiency for long-horizon agents.

preprint2026arXiv

Multi-Personality Generation of LLMs at Decoding-time

Multi-personality generation for LLMs, enabling simultaneous embodiment of multiple personalization attributes, is a fundamental challenge. Existing retraining-based approaches are costly and poorly scalable, while decoding-time methods often rely on external models or heuristics, limiting flexibility and robustness. In this paper, we propose a novel Multi-Personality Generation (MPG) framework under the decoding-time combination paradigm. It flexibly controls multi-personality without relying on scarce multi-dimensional models or extra training, leveraging implicit density ratios in single-dimensional models as a "free lunch" to reformulate the task as sampling from a target strategy aggregating these ratios. To implement MPG efficiently, we design Speculative Chunk-level based Rejection sampling (SCR), which generates responses in chunks and parallelly validates them via estimated thresholds within a sliding window. This significantly reduces computational overhead while maintaining high-quality generation. Experiments on MBTI personality and Role-Playing demonstrate the effectiveness of MPG, showing improvements up to 16%-18%. Code and data are available at https://github.com/Libra117/MPG .

preprint2024arXiv

On the Model-Misspecification in Reinforcement Learning

The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification (a disparity between the ground-truth and optimal function approximators), it is shown that policy-based approaches can be robust even when the policy function approximation is under a large locally-bounded misspecification error, with which the function class may exhibit a $Ω(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of $\widetilde{O}\left(\text{poly}(d H)(\sqrt{K} + Kζ) \right)$, where $d$ represents the complexity of the function class, $H$ is the episode length, $K$ is the total number of episodes, and $ζ$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $ζ$, thereby enhancing its practical applicability.

preprint2022arXiv

Surgical Phase Recognition in Laparoscopic Cholecystectomy

Automatic recognition of surgical phases in surgical videos is a fundamental task in surgical workflow analysis. In this report, we propose a Transformer-based method that utilizes calibrated confidence scores for a 2-stage inference pipeline, which dynamically switches between a baseline model and a separately trained transition model depending on the calibrated confidence level. Our method outperforms the baseline model on the Cholec80 dataset, and can be applied to a variety of action segmentation methods.

preprint2020arXiv

Contrastive Clustering

In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline.

preprint2020arXiv

Large Scale Many-Objective Optimization Driven by Distributional Adversarial Networks

Estimation of distribution algorithms (EDA) as one of the EAs is a stochastic optimization problem which establishes a probability model to describe the distribution of solutions and randomly samples the probability model to create offspring and optimize model and population. Reference Vector Guided Evolutionary (RVEA) based on the EDA framework, having a better performance to solve MaOPs. Besides, using the generative adversarial networks to generate offspring solutions is also a state-of-art thought in EAs instead of crossover and mutation. In this paper, we will propose a novel algorithm based on RVEA[1] framework and using Distributional Adversarial Networks (DAN) [2]to generate new offspring. DAN uses a new distributional framework for adversarial training of neural networks and operates on genuine samples rather than a single point because the framework also leads to more stable training and extraordinarily better mode coverage compared to single-point-sample methods. Thereby, DAN can quickly generate offspring with high convergence regarding the same distribution of data. In addition, we also use Large-Scale Multi-Objective Optimization Based on A Competitive Swarm Optimizer (LMOCSO)[3] to adopts a new two-stage strategy to update the position in order to significantly increase the search efficiency to find optimal solutions in huge decision space. The propose new algorithm will be tested on 9 benchmark problems in Large scale multi-objective problems (LSMOP). To measure the performance, we will compare our proposal algorithm with some state-of-art EAs e.g., RM-MEDA[4], MO-CMA[10] and NSGA-II.

preprint2020arXiv

Many-Objective Estimation of Distribution Optimization Algorithm Based on WGAN-GP

Estimation of distribution algorithms (EDA) are stochastic optimization algorithms. EDA establishes a probability model to describe the distribution of solution from the perspective of population macroscopically by statistical learning method, and then randomly samples the probability model to generate a new population. EDA can better solve multi-objective optimal problems (MOPs). However, the performance of EDA decreases in solving many-objective optimal problems (MaOPs), which contains more than three objectives. Reference Vector Guided Evolutionary Algorithm (RVEA), based on the EDA framework, can better solve MaOPs. In our paper, we use the framework of RVEA. However, we generate the new population by Wasserstein Generative Adversarial Networks-Gradient Penalty (WGAN-GP) instead of using crossover and mutation. WGAN-GP have advantages of fast convergence, good stability and high sample quality. WGAN-GP learn the mapping relationship from standard normal distribution to given data set distribution based on a given data set subject to the same distribution. It can quickly generate populations with high diversity and good convergence. To measure the performance, RM-MEDA, MOPSO and NSGA-II are selected to perform comparison experiments over DTLZ and LSMOP test suites with 3-, 5-, 8-, 10- and 15-objective.

preprint2010arXiv

Structured Error Recovery for Codeword-Stabilized Quantum Codes

Codeword stabilized (CWS) codes are, in general, non-additive quantum codes that can correct errors by an exhaustive search of different error patterns, similar to the way that we decode classical non-linear codes. For an n-qubit quantum code correcting errors on up to t qubits, this brute-force approach consecutively tests different errors of weight t or less, and employs a separate n-qubit measurement in each test. In this paper, we suggest an error grouping technique that allows to simultaneously test large groups of errors in a single measurement. This structured error recovery technique exponentially reduces the number of measurements by about 3^t times. While it still leaves exponentially many measurements for a generic CWS code, the technique is equivalent to syndrome-based recovery for the special case of additive CWS codes.