Source author record

Zhao Yang

Zhao Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

17works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Heterogeneous Temporal Memory Governance Framework for Long-Term LLM Persona Consistency

Large language models often suffer from fact loss, timeline confusion, persona drift, and reduced stability during long-range interaction, especially under high-noise knowledge bases, context clearing, and cross-model transfer. To address these issues, we introduce ARPM, an external temporal memory governance framework for long-term dialogue. ARPM separates static knowledge memory from dynamic dialogue experience memory and combines vector retrieval, BM25, RRF fusion, dual-temporal reranking, chronological evidence reading, and a controlled analysis protocol for evidence verification and answer binding. Unlike approaches that encode persona consistency into model weights or rely only on long context, ARPM treats continuity as a traceable, auditable, and transferable governance problem. Using engineering logs, we conduct three experiments. First, in a 50-round question-answering setting, we compare signal-to-noise ratios of 1:5 and 1:200+, and distinguish CSV auto-judgment from manual review. Under 1:5, CSV recall accuracy is 54.0%, while manual review raises it to 100.0%. Under 1:200+, the values are 44.0% and 80.0%. These results show that automatic rules can underestimate recall after supporting evidence enters the prompt. Second, ablation results show that dialogue history retrieval is necessary for recent continuity: disabling it reduces strict accuracy from 100% to 66.7%, and disabling BM25 reduces it to 80.0%, indicating that pure semantic retrieval is insufficient for correction and tracing. Third, under a 5.1-million-character noise substrate, periodic context clearing, and multi-model handoff, ARPM maintains semantic continuity, boundary continuity, and persona consistency, while exposing limits caused by weak protocol compliance. These findings show that long-term persona consistency can be decomposed into governable components and evaluated in a white-box manner.

preprint2026arXiv

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation

While recent work in Reinforcement Learning with Verifiable Rewards (RLVR) has shown that a small subset of critical tokens disproportionately drives reasoning gains, an analogous token-level understanding of On-Policy Distillation (OPD) remains largely unexplored. In this work, we investigate high-loss tokens, a token type that--as the most direct signal of student-teacher mismatch under OPD's per-token KL objective--should progressively diminish as training converges according to existing studies; however, our empirical analysis shows otherwise. Even after OPD training reaches apparent saturation, a substantial subset of tokens continues to exhibit persistently high loss; these tokens, which we term Rock Tokens, can account for up to 18\% of the tokens in generated outputs. Our investigation reveals two startling paradoxes. First, despite their high occurrence frequency providing a disproportionately large share of total gradient norms, Rock Tokens themselves remain stagnant throughout training, resisting teacher-driven corrections. Second, through causal intervention, we find that these tokens provide negligible functional contribution to the model's actual reasoning performance. These findings suggest that a vast amount of optimization bandwidth is spent on structural and discourse residuals that the student model cannot or need not internalize. By deconstructing these dynamics, we demonstrate that strategically bypassing these ``stumbling blocks'' can significantly streamline the alignment process, challenging the necessity of uniform token weighting and offering a more efficient paradigm for large-scale model distillation.

preprint2026arXiv

DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

Open-Vocabulary Object Detection (OVD) models break the limitations of closed-set detection, enabling the iden- tification of unseen categories through natural language prompts. However, they exhibit notable limitations in fine- grained detection tasks involving attributes like color, ma- terial, and texture. We attribute this performance bottle- neck in OVD models to a core issue: when category sig- nals dominate, OVD models tend to marginalize attribute information during inference. This leads to incorrect bind- ing between attributes and target objects. To address this, we propose the Dual-Stage Attribute Activation (DSAA) framework, which enhances fine-grained detection capa- bilities by strengthening attribute semantics at two criti- cal stages. In the text embedding stage, we employ At- tribute Prefix Adapter (APA) module to generate attribute prefixes that inject explicit attribute priors. To further am- plify the influence of these attributes, our Key/Value (K/V) Modulator module then intervenes during the BERT encod- ing phase, selectively enhancing the Key and Value vec- tors of the corresponding attribute tokens. In addition, we introduce an attribute-aware contrastive loss to improve discrimination among same-category instances with differ- ent attributes during training. Experimental results on the FG-OVD benchmark demonstrate the effectiveness of our method across various mainstream open-vocabulary mod- els.

preprint2026arXiv

Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving

Reinforcement learning (RL) has shown considerable potential in autonomous driving (AD), yet its vulnerability to perturbations remains a critical barrier to real-world deployment. As a primary countermeasure, adversarial training improves policy robustness by training the AD agent in the presence of an adversary that deliberately introduces perturbations. Existing approaches typically model the interaction as a zero-sum game with continuous attacks. However, such designs overlook the inherent asymmetry between the agent and the adversary and then fail to reflect the sparsity of safety-critical risks, rendering the achieved robustness inadequate for practical AD scenarios. To address these limitations, we introduce criticality-aware robust RL (CARRL), a novel adversarial training approach for handling sparse, safety-critical risks in autonomous driving. CARRL consists of two interacting components: a risk exposure adversary (REA) and a risk-targeted robust agent (RTRA). We model the interaction between the REA and RTRA as a general-sum game, allowing the REA to focus on exposing safety-critical failures (e.g., collisions) while the RTRA learns to balance safety with driving efficiency. The REA employs a decoupled optimization mechanism to better identify and exploit sparse safety-critical moments under a constrained budget. However, such focused attacks inevitably result in a scarcity of adversarial data. The RTRA copes with this scarcity by jointly leveraging benign and adversarial experiences via a dual replay buffer and enforces policy consistency under perturbations to stabilize behavior. Experimental results demonstrate that our approach reduces the collision rate by at least 22.66\% across all cases compared to state-of-the-art baseline methods.

preprint2023arXiv

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.

preprint2022arXiv

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality. Since the hierarchy of molecular knowledge is profound, even humans learn from different modalities including both intuitive diagrams and professional texts to assist their understanding. Inspired by this, we propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data (crawled from published Scientific Citation Index papers) via contrastive learning. This AI model represents a critical attempt that directly bridges molecular graphs and natural language. Importantly, through capturing the specific and complementary information of the two modalities, our proposed model can better grasp molecular expertise. Experimental results show that our model not only exhibits promising performance in cross-modal tasks such as cross-modal retrieval and molecule caption, but also enhances molecular property prediction and possesses capability to generate meaningful molecular graphs from natural language descriptions. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine, among others.

preprint2022arXiv

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization

Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to the cluster. This involves maximizing the similarity across Intra and Inter samples of the same class. As a result, it pulls the instances closer to more generalized representations that form more prominent clusters and reduces the adverse impact of noises. We show that our method provides consistent improvements in accuracy over different backbone model architectures under different noise environments. We also demonstrate that our proposed framework has improved the accuracy of unseen out-of-domain noises and unseen variant noise SNRs. This indicates the significance of our work with the overall refinement in noise robustness.

preprint2022arXiv

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image. A paradigm for tackling this problem is to leverage a powerful vision-language ("cross-modal") decoder to fuse features independently extracted from a vision encoder and a language encoder. Recent methods have made remarkable advancements in this paradigm by exploiting Transformers as cross-modal decoders, concurrent to the Transformer's overwhelming success in many other vision-language tasks. Adopting a different approach in this work, we show that significantly better cross-modal alignments can be achieved through the early fusion of linguistic and visual features in intermediate layers of a vision Transformer encoder network. By conducting cross-modal feature fusion in the visual feature encoding stage, we can leverage the well-proven correlation modeling power of a Transformer encoder for excavating helpful multi-modal context. This way, accurate segmentation results are readily harvested with a light-weight mask predictor. Without bells and whistles, our method surpasses the previous state-of-the-art methods on RefCOCO, RefCOCO+, and G-Ref by large margins.

preprint2022arXiv

On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR

End-to-end automatic speech recognition (ASR) has achieved promising results. However, most existing end-to-end ASR methods neglect the use of specific language characteristics. For Mandarin Chinese ASR tasks, there exist mutual promotion relationship between Pinyin and Character where Chinese characters can be romanized by Pinyin. Based on the above intuition, we first investigate types of end-to-end encoder-decoder based models in the single-input dual-output (SIDO) multi-task framework, after which a novel asynchronous decoding with fuzzy Pinyin sampling method is proposed according to the one-to-one correspondence characteristics between Pinyin and Character. Furthermore, we proposed a two-stage training strategy to make training more stable and converge faster. The results on the test sets of AISHELL-1 dataset show that the proposed enhanced dual-decoder model without a language model is improved by a big margin compared to strong baseline models.

preprint2022arXiv

When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation

Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper we present a systematic study of post-exploration, answering open questions that the Go-Explore paper did not answer yet. First, we study the isolated potential of post-exploration, by turning it on and off within the same algorithm. Subsequently, we introduce new methodology to adaptively decide when to post-explore and for how long to post-explore. Experiments on a range of MiniGrid environments show that post-exploration indeed boosts performance (with a bigger impact than tuning regular exploration parameters), and this effect is further enhanced by adaptively deciding when and for how long to post-explore. In short, our work identifies adaptive post-exploration as a promising direction for RL exploration research.

preprint2021arXiv

Explore User Neighborhood for Real-time E-commerce Recommendation

Recommender systems play a vital role in modern online services, such as Amazon and Taobao. Traditional personalized methods, which focus on user-item (UI) relations, have been widely applied in industrial settings, owing to their efficiency and effectiveness. Despite their success, we argue that these approaches ignore local information hidden in similar users. To tackle this problem, user-based methods exploit similar user relations to make recommendations in a local perspective. Nevertheless, traditional user-based methods, like userKNN and matrix factorization, are intractable to be deployed in the real-time applications since such transductive models have to be recomputed or retrained with any new interaction. To overcome this challenge, we propose a framework called self-complementary collaborative filtering~(SCCF) which can make recommendations with both global and local information in real time. On the one hand, it utilizes UI relations and user neighborhood to capture both global and local information. On the other hand, it can identify similar users for each user in real time by inferring user representations on the fly with an inductive model. The proposed framework can be seamlessly incorporated into existing inductive UI approach and benefit from user neighborhood with little additional computation. It is also the first attempt to apply user-based methods in real-time settings. The effectiveness and efficiency of SCCF are demonstrated through extensive offline experiments on four public datasets, as well as a large scale online A/B test in Taobao.

preprint2020arXiv

Stability of strong detonation waves for Majda's model with general ignition functions

For strong detonation waves of the inviscid Majda model, spectral stability was established by Jung and Yao for waves with step-type ignition functions, by a proof based largely on explicit knowledge of wave profiles. In the present work, we extend their stability results to strong detonation waves with more general ignition functions where explicit profiles are unknown. Our proof is based on reduction to a generalized Sturm-Liouville problem, similar to that used by Sukhtayev, Yang, and Zumbrun to study spectral stability of hydraulic shock profiles of the Saint-Venant equations.

preprint2020arXiv

Two-Grid Deflated Krylov Methods for Linear Equations

An approach is given for solving large linear systems that combines Krylov methods with use of two different grid levels. Eigenvectors are computed on the coarse grid and used to deflate eigenvalues on the fine grid. GMRES-type methods are first used on both the coarse and fine grids. Then another approach is given that has a restarted BiCGStab (or IDR) method on the fine grid. While BiCGStab is generally considered to be a non-restarted method, it works well in this context with deflating and restarting. Tests show this new approach can be very efficient for difficult linear equations problems.

preprint2016arXiv

Holographic duality from random tensor networks

Tensor networks provide a natural framework for exploring holographic duality because they obey entanglement area laws. They have been used to construct explicit toy models realizing many interesting structural features of the AdS/CFT correspondence, including the non-uniqueness of bulk operator reconstruction in the boundary theory. In this article, we explore the holographic properties of networks of random tensors. We find that our models naturally incorporate many features that are analogous to those of the AdS/CFT correspondence. When the bond dimension of the tensors is large, we show that the entanglement entropy of boundary regions, whether connected or not, obey the Ryu-Takayanagi entropy formula, a fact closely related to known properties of the multipartite entanglement of assistance. Moreover, we find that each boundary region faithfully encodes the physics of the entire bulk entanglement wedge. Our method is to interpret the average over random tensors as the partition function of a classical ferromagnetic Ising model, so that the minimal surfaces of Ryu-Takayanagi appear as domain walls. Upon including the analog of a bulk field, we find that our model reproduces the expected corrections to the Ryu-Takayanagi formula: the minimal surface is displaced and the entropy is augmented by the entanglement of the bulk field. Increasing the entanglement of the bulk field ultimately changes the minimal surface topologically in a way similar to creation of a black hole. Extrapolating bulk correlation functions to the boundary permits the calculation of the scaling dimensions of boundary operators, which exhibit a large gap between a small number of low-dimension operators and the rest. While we are primarily motivated by AdS/CFT duality, our main results define a more general form of bulk-boundary correspondence which could be useful for extending holography to other spacetimes.

preprint2015arXiv

Bidirectional holographic codes and sub-AdS locality

Tensor networks implementing quantum error correcting codes have recently been used to construct toy models of holographic duality explicitly realizing some of the more puzzling features of the AdS/CFT correspondence. These models reproduce the Ryu-Takayanagi entropy formula for boundary intervals, and allow bulk operators to be mapped to the boundary in a redundant fashion. These exactly solvable, explicit models have provided valuable insight but nonetheless suffer from many deficiencies, some of which we attempt to address in this article. We propose a new class of tensor network models that subsume the earlier advances and, in addition, incorporate additional features of holographic duality, including: (1) a holographic interpretation of all boundary states, not just those in a "code" subspace, (2) a set of bulk states playing the role of "classical geometries" which reproduce the Ryu-Takayanagi formula for boundary intervals, (3) a bulk gauge symmetry analogous to diffeomorphism invariance in gravitational theories, (4) emergent bulk locality for sufficiently sparse excitations, and (5) the ability to describe geometry at sub-AdS resolutions or even flat space.

preprint2015arXiv

Multi-level Resistive Switching Characteristics of W/Co:TiO2/FTO Structures

In the present work, multi-level resistive switching (RS) in W/Co:TiO2/FTO structures induced by a multi-mixed mechanism was studied. It was found that the devices could be reproducibly programmed into three nonvolatile resistance states. And the directly switching between any resistance states was realized. This increases the operation speed and lowers the complexity of control circuit of multi-state nonvolatile memory.

preprint2011arXiv

A two-component geodesic equation on a space of constant positive curvature

We propose a new two-component geodesic equation with the unusual property that the underlying space has constant positive curvature. In the special case of one space dimension, the equation reduces to the two-component Hunter-Saxton equation.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.09253:author:5:zhao-yang

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.14802:author:1:zhao-yang

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.18023:author:6:zhao-yang

Imported May 20, 2026Synced May 20, 2026

2 works

Aske Plaat

Researcher

Aske Plaat contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Dianwen Ng

Researcher

Dianwen Ng contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Mike Preuss

Researcher

Mike Preuss contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Patrick Hayden

Researcher

Patrick Hayden contributes to research discovery and scholarly infrastructure.

Open to collaborate

Zhao Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

A Heterogeneous Temporal Memory Governance Framework for Long-Term LLM Persona Consistency

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation

DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR

When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation

Explore User Neighborhood for Real-time E-commerce Recommendation

Stability of strong detonation waves for Majda's model with general ignition functions

Two-Grid Deflated Krylov Methods for Linear Equations

Holographic duality from random tensor networks

Bidirectional holographic codes and sub-AdS locality

Multi-level Resistive Switching Characteristics of W/Co:TiO2/FTO Structures

A two-component geodesic equation on a space of constant positive curvature