Researcher profile

Yiqun Chen

Yiqun Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search

Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use. However, prevailing systems rely on monolithic agents that suffer from structural bottlenecks, including unconstrained reasoning outputs that inflate trajectories, sparse outcome-level rewards that complicate credit assignment, and stochastic search noise that destabilizes learning. To address these challenges, we propose \textbf{M-ASK} (Multi-Agent Search and Knowledge), a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context. This decomposition allows each agent to focus on a well-defined subtask and reduces interference between search and context construction. Furthermore, to enable stable coordination, M-ASK employs turn-level rewards to provide granular supervision for both search decisions and knowledge updates. Experiments on multi-hop QA benchmarks demonstrate that M-ASK outperforms strong baselines, achieving not only superior answer accuracy but also significantly more stable training dynamics.\footnote{The source code for M-ASK is available at https://github.com/chenyiqun/M-ASK.}

preprint2026arXiv

Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast

The iterative denoising paradigm of Diffusion Large Language Models (DLMs) endows them with a distinct advantage in global context modeling. However, current decoding strategies fail to leverage this capability, typically exhibiting a local preference that overlooks the heterogeneous information density within the context, ultimately degrading generation quality. To address this limitation, we systematically investigate high-information-density (HD) tokens and present two key findings: (1) explicitly conditioning on HD tokens substantially improves output quality; and (2) HD tokens exhibit an early-decoding tendency, converging earlier than surrounding tokens. Motivated by these findings, we propose Focus on the Core \textbf{(FoCore)}, a training-free decoding strategy that utilizes HD tokens in a self-contrast manner, wherein HD tokens are temporarily remasked as negative samples, to guide generation. We further introduce FoCore\_Accelerate \textbf{(FoCore\_A)}, an efficient variant that, upon detecting HD token convergence, performs parallel decoding over stable candidates within a local context window, substantially accelerating generation. Extensive experiments on math, code and logical reasoning benchmarks demonstrate that FoCore consistently improves generation quality and efficiency across both LLaDA and Dream backbones. For instance, on HumanEval, FoCore improves pass@1 from 39.02 to 42.68 over standard Classifier-Free Guidance, while FoCore-A reduces the number of decoding steps by 2.07x and per-sample latency from 20.76s to 8.64s (-58.4\%).

preprint2023arXiv

Transformer in Transformer as Backbone for Deep Reinforcement Learning

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings consistently.

preprint2022arXiv

Boundedness of Calderón--Zygmund operators on ball Campanato-type function spaces

Let $X$ be a ball quasi-Banach function space on ${\mathbb R}^n$ satisfying some mild assumptions. In this article, the authors first find a reasonable version $\widetilde{T}$ of the Calderón--Zygmund operator $T$ on the ball Campanato-type function space $\mathcal{L}_{X,q,s,d}(\mathbb{R}^n)$ with $q\in[1,\infty)$, $s\in\mathbb{Z}_+^n$, and $d\in(0,\infty)$. Then the authors prove that $\widetilde{T}$ is bounded on $\mathcal{L}_{X,q,s,d}(\mathbb{R}^n)$ if and only if, for any $γ\in\mathbb{Z}^n_+$ with $|γ|\leq s$, $T^*(x^γ)=0$, which is hence sharp. Moreover, $\widetilde{T}$ is proved to be the adjoint operator of $T$, which further strengthens the rationality of the definition of $\widetilde{T}$. All these results have a wide range of applications. In particular, even when they are applied, respectively, to weighted Lebesgue spaces, variable Lebesgue spaces, Orlicz spaces, Orlicz-slice spaces, Morrey spaces, mixed-norm Lebesgue spaces, local generalized Herz spaces, and mixed-norm Herz spaces, all the obtained results are new. The proofs of these results strongly depend on the properties of the kernel of $T$ under consideration and also on the dual theorem on $\mathcal{L}_{X,q,s,d}(\mathbb{R}^n)$.

preprint2022arXiv

Boundedness of Fractional Integrals on Ball Campanato-Type Function Spaces

Let $X$ be a ball quasi-Banach function space on ${\mathbb R}^n$ satisfying some mild assumptions and let $α\in(0,n)$ and $β\in(1,\infty)$. In this article, when $α\in(0,1)$, the authors first find a reasonable version $\widetilde{I}_α$ of the fractional integral $I_α$ on the ball Campanato-type function space $\mathcal{L}_{X,q,s,d}(\mathbb{R}^n)$ with $q\in[1,\infty)$, $s\in\mathbb{Z}_+^n$, and $d\in(0,\infty)$. Then the authors prove that $\widetilde{I}_α$ is bounded from $\mathcal{L}_{X^β,q,s,d}(\mathbb{R}^n)$ to $\mathcal{L}_{X,q,s,d}(\mathbb{R}^n)$ if and only if there exists a positive constant $C$ such that, for any ball $B\subset \mathbb{R}^n$, $|B|^{\fracα{n}}\leq C \|\mathbf{1}_B\|_X^{\frac{β-1}β}$, where $X^β$ denotes the $β$-convexification of $X$. Furthermore, the authors extend the range $α\in(0,1)$ in $\widetilde{I}_α$ to the range $α\in(0,n)$ and also obtain the corresponding boundedness in this case. Moreover, $\widetilde{I}_α$ is proved to be the adjoint operator of $I_α$. All these results have a wide range of applications. Particularly, even when they are applied, respectively, to Morrey spaces, mixed-norm Lebesgue spaces, local generalized Herz spaces, and mixed-norm Herz spaces, all the obtained results are new. The proofs of these results strongly depend on the dual theorem on $\mathcal{L}_{X,q,s,d}(\mathbb{R}^n)$ and also on the special atomic decomposition of molecules of $H_X(\mathbb{R}^n)$ (the Hardy-type space associated with $X$) which proves the predual space of $\mathcal{L}_{X,q,s,d}(\mathbb{R}^n)$.

preprint2022arXiv

Boundedness of Fractional Integrals on Hardy Spaces Associated with Ball Quasi-Banach Function Spaces

Let $X$ be a ball quasi-Banach function space on ${\mathbb R}^n$ and $H_X({\mathbb R}^n)$ the Hardy space associated with $X$, and let $α\in(0,n)$ and $β\in(1,\infty)$. In this article, assuming that the (powered) Hardy--Littlewood maximal operator satisfies the Fefferman--Stein vector-valued maximal inequality on $X$ and is bounded on the associate space of $X$, the authors prove that the fractional integral $I_α$ can be extended to a bounded linear operator from $H_X({\mathbb R}^n)$ to $H_{X^β}({\mathbb R}^n)$ if and only if there exists a positive constant $C$ such that, for any ball $B\subset \mathbb{R}^n$, $|B|^{\fracα{n}}\leq C \|\mathbf{1}_B\|_X^{\frac{β-1}β}$, where $X^β$ denotes the $β$-convexification of $X$. Moreover, under some different reasonable assumptions on both $X$ and another ball quasi-Banach function space $Y$, the authors also consider the mapping property of $I_α$ from $H_X({\mathbb R}^n)$ to $H_Y({\mathbb R}^n)$ via using the extrapolation theorem. All these results have a wide range of applications. Particularly, when these are applied, respectively, to Morrey spaces, mixed-norm Lebesgue spaces, local generalized Herz spaces, and mixed-norm Herz spaces, all these results are new. The proofs of these theorems strongly depend on atomic and molecular characterizations of $H_X({\mathbb R}^n)$.

preprint2022arXiv

Improving Fine-tuning of Self-supervised Models with Contrastive Initialization

Self-supervised learning (SSL) has achieved remarkable performance in pretraining the models that can be further used in downstream tasks via fine-tuning. However, these self-supervised models may not capture meaningful semantic information since the images belonging to the same class are always regarded as negative pairs in the contrastive loss. Consequently, the images of the same class are often located far away from each other in learned feature space, which would inevitably hamper the fine-tuning process. To address this issue, we seek to provide a better initialization for the self-supervised models by enhancing the semantic information. To this end, we propose a Contrastive Initialization (COIN) method that breaks the standard fine-tuning pipeline by introducing an extra initialization stage before fine-tuning. Extensive experiments show that, with the enriched semantics, our COIN significantly outperforms existing methods without introducing extra training cost and sets new state-of-the-arts on multiple downstream tasks.