Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes jailbreaking. Our experiments reveal that the safety defenses of SOTA models deteriorate sharply as resolution degrades, surprisingly persisting even when text remains legible. We attribute this to ``Cognitive Overload'', hypothesizing that the effort required to decipher degraded inputs diverts attentional resources from safety auditing. This phenomenon is consistent across various visual perturbations, including noise and geometric distortion. To address this, we propose a simple ``Structured Cognitive Offloading'' strategy that mitigates these risks by enforcing a serialized pipeline to decouple visual transcription from safety assessment. Our work exposes a significant risk in vision-based compression and provides critical insights for the secure design of future MLLMs.

preprint2026arXiv

Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity

Autoregressive video diffusion models support real-time synthesis but suffer from error accumulation and context loss over long horizons. We discover that attention heads in AR video diffusion transformers serve functionally distinct roles as local heads for detail refinement, anchor heads for structural stabilization, and memory heads for long-range context aggregation, yet existing methods treat them uniformly, leading to suboptimal KV cache allocation. We propose Head Forcing, a training-free framework that assigns each head type a tailored KV cache strategy: local and anchor heads retain only essential tokens, while memory heads employ a hierarchical memory system with dynamic episodic updates for long-range consistency. A head-wise RoPE re-encoding scheme further ensures positional encodings remain within the pretrained range. Without additional training, Head Forcing extends generation from 5 seconds to minute-level duration, supports multi-prompt interactive synthesis, and consistently outperforms existing baselines. Project Page: https://jiahaotian-sjtu.github.io/headforcing.github.io/.

preprint2026arXiv

How Few-Shot Examples Add Up: A Causal Decomposition of Function Vectors in In-Context Learning

In-context learning (ICL) excels at new tasks from minimal examples, yet we still lack a mechanistic explanation of how few-shot prompts shape a model's function vector (FV)--a causal activation direction that drives task behavior on the ICL query. Across tasks and models, an $n$-shot FV is well-approximated by a linear combination of example-level sub-FVs, suggesting additive and composable contributions from individual demonstrations. Beyond additivity, we show that models contextualize individual examples' representations based on prior examples to adaptively reweight which demonstrations dominate the FV: attention shifts toward examples that are more informative and less ambiguous under the context. Finally, a causal decomposition separates Query-Key routing from Value updates, finding that contextualization's most consistent contributions to FV quality arise from Query-Key alignment--particularly in ambiguous settings--while Value-mediated effects are more heterogeneous. Together, these results unify additive superposition with context-dependent attention reweighting into a mechanistic, testable account of how few-shot prompts implement tasks.

preprint2026arXiv

Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning

On-policy distillation (OPD) leverages dense teacher rewards to enhance reasoning models. However, scaling OPD to long-horizon tasks exposes a critical flaw: as the student's generated prefix inevitably diverges from the teacher's thought process, the teacher's dense reward loses local exploitability. Continuing to generate and evaluate tokens on these ``drifted'' trajectories not only degrades reward quality but also incurs massive computational waste. To address this, we introduce \textbf{Prune-OPD}, a framework that dynamically aligns training budgets with supervision quality. By continuously monitoring the local compatibility between student and teacher predictions (e.g., via top-$k$ overlap), Prune-OPD detects prefix-drift events in real time. Upon detecting severe drift, it monotonically down-weights subsequent unreliable rewards and triggers dynamic rollout truncation. This allows the training process to halt futile generation and reallocate compute strictly to reliable teacher supervision. Across diverse teacher-student combinations, Prune-OPD consistently aligns computation with supervision reliability. When prefix drift makes dense teacher rewards unreliable, it reduces training time by 37.6\%--68.0\% while preserving, and often improving, performance on challenging benchmarks (AMC, AIME, HMMT). When student-teacher compatibility remains high, it automatically preserves long-context supervision by expanding the training window. These results suggest that Prune-OPD improves OPD not by blindly shortening rollouts, but by reallocating computation toward locally exploitable teacher rewards.

preprint2026arXiv

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

Online streaming video understanding requires models to process continuous visual inputs and respond to user queries in real time, where the unbounded stream and unpredictable query timing turn memory management into a central challenge. Existing methods typically compress visual tokens via visual similarity heuristics, or augment compression with KV-cache-level retrieval. However, compression decisions rarely incorporate semantic signals, and retrieval is often added after compression is finalized, making the two stages hard to coordinate. We present SAVEMem, a training-free dual-stage framework that brings semantic awareness into memory generation and lets the retrieval scope adapt per query. In Stage~1, SAVEMem builds a three-tier streaming memory online under a constant memory budget. A fixed pseudo-question bank provides a lightweight semantic prior, so that long-term retention is shaped by semantic salience rather than visual similarity alone. In Stage~2, SAVEMem performs query-aware retrieval over this memory. An anchor-conditioned recency gate adapts the retrieval scope from short-term to mid- and long-term memory based on whether the query targets the present or the distant past. Within this scope, late interaction between query and memory tokens selects candidate frames for answering. Applied to Qwen2.5-VL without training, SAVEMem improves the OVO-Bench overall score from 52.27 to 62.69 and yields consistent gains on StreamingBench and ODV-Bench, while reducing peak GPU memory by 48\% at 128 frames over the backbone.

preprint2026arXiv

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Decompositional jailbreaks pose a critical threat to large language models (LLMs) by allowing adversaries to fragment a malicious objective into a sequence of individually benign queries that collectively reconstruct prohibited content. In real-world deployments, LLMs face a continuous, untraceable stream of fully anonymized and arbitrarily interleaved requests, infiltrated by covertly distributed adversarial queries. Under this rigorous threat model, state-of-the-art defensive strategies exhibit fundamental limitations. In the absence of trustworthy user metadata, they are incapable of tracking global historical contexts, while their deployment of generative models for real-time monitoring introduces computationally prohibitive overhead. To address this, we present TwinGate, a stateful dual-encoder defense framework. TwinGate employs Asymmetric Contrastive Learning (ACL) to cluster semantically disparate but intent-matched malicious fragments in a shared latent space, while a parallel frozen encoder suppresses false positives arising from benign topical overlap. Each request requires only a single lightweight forward pass, enabling the defense to execute in parallel with the target model's prefill phase at negligible latency overhead. To evaluate our approach and advance future research, we construct a comprehensive dataset of over 3.62 million instructions spanning 8,600 distinct malicious intents. Evaluated on this large-scale corpus under a strictly causal protocol, TwinGate achieves high malicious intent recall at a remarkably low false positive rate while remaining highly robust against adaptive attacks. Furthermore, our proposal substantially outperforms stateful and stateless baselines, delivering superior throughput and reduced latency.

preprint2025arXiv

EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation

Sound effects build an essential layer of multimodal storytelling, shaping the emotional atmosphere and the narrative semantics of videos. Despite recent advancement in video-text-to-audio (VT2A), the current formulation faces three key limitations: First, an imbalance between visual and textual conditioning that leads to visual dominance; Second, the absence of a concrete definition for fine-grained controllable generation; Third, weak instruction understanding and following, as existing datasets rely on brief categorical tags. To address these limitations, we introduce EchoFoley, a new task designed for video-grounded sound generation with both event level local control and hierarchical semantic control. Our symbolic representation for sounding events specifies when, what, and how each sound is produced within a video or instruction, enabling fine-grained controls like sound generation, insertion, and editing. To support this task, we construct EchoFoley-6k, a large-scale, expert-curated benchmark containing over 6,000 video-instruction-annotation triplets. Building upon this foundation, we propose EchoVidia a sounding-event-centric agentic generation framework with slow-fast thinking strategy. Experiments show that EchoVidia surpasses recent VT2A models by 40.7% in controllability and 12.5% in perceptual quality.

preprint2022arXiv

Dangling-Aware Entity Alignment with Mixed High-Order Proximities

We study dangling-aware entity alignment in knowledge graphs (KGs), which is an underexplored but important problem. As different KGs are naturally constructed by different sets of entities, a KG commonly contains some dangling entities that cannot find counterparts in other KGs. Therefore, dangling-aware entity alignment is more realistic than the conventional entity alignment where prior studies simply ignore dangling entities. We propose a framework using mixed high-order proximities on dangling-aware entity alignment. Our framework utilizes both the local high-order proximity in a nearest neighbor subgraph and the global high-order proximity in an embedding space for both dangling detection and entity alignment. Extensive experiments with two evaluation settings shows that our framework more precisely detects dangling entities, and better aligns matchable entities. Further investigations demonstrate that our framework can mitigate the hubness problem on dangling-aware entity alignment.

preprint2022arXiv

GRAPHCACHE: Message Passing as Caching for Sentence-Level Relation Extraction

Entity types and textual context are essential properties for sentence-level relation extraction (RE). Existing work only encodes these properties within individual instances, which limits the performance of RE given the insufficient features in a single sentence. In contrast, we model these properties from the whole dataset and use the dataset-level information to enrich the semantics of every instance. We propose the GRAPHCACHE (Graph Neural Network as Caching) module, that propagates the features across sentences to learn better representations for RE. GRAPHCACHE aggregates the features from sentences in the whole dataset to learn global representations of properties, and use them to augment the local features within individual sentences. The global property features act as dataset-level prior knowledge for RE, and a complement to the sentence-level features. Inspired by the classical caching technique in computer systems, we develop GRAPHCACHE to update the property representations in an online manner. Overall, GRAPHCACHE yields significant effectiveness gains on RE and enables efficient message passing across all sentences in the dataset.

preprint2022arXiv

Investigation of spin rotators in CEPC at the Z-pole

Longitudinal polarization is an important design aspect of the future 100 km-scale Circular Electron Position Collider (CEPC). Spin rotators are needed in CEPC collider rings to make the beam polarization along the longitudinal direction at the interaction points (IPs). This paper focuses on the design of spin rotators for CEPC at Z-pole (45.6 GeV). The design of spin rotators in CEPC at Z-pole is based on solenoid magnets and horizontal bending magnets sections. The coupling of transverse motion introduced by solenoids is compensated with quadrupole lenses. Adjustments have been made to the layout to implement the spin rotators into the collider rings.Longitudinal polarized beam can be achieved at the IPs with the spin rotators. High degree of polarization is attainable, while the effect of spin rotators on orbital motion is acceptable. The detailed simulation results will be presented.A solenoid-based spin rotator configuration is designed and integrated into the CEPC collider ring lattice. According to the simulation results, the polarization requirements can be satisfied.

preprint2022arXiv

LSCALE: Latent Space Clustering-Based Active Learning for Node Classification

Node classification on graphs is an important task in many practical domains. It usually requires labels for training, which can be difficult or expensive to obtain in practice. Given a budget for labelling, active learning aims to improve performance by carefully choosing which nodes to label. Previous graph active learning methods learn representations using labelled nodes and select some unlabelled nodes for label acquisition. However, they do not fully utilize the representation power present in unlabelled nodes. We argue that the representation power in unlabelled nodes can be useful for active learning and for further improving performance of active learning for node classification. In this paper, we propose a latent space clustering-based active learning framework for node classification (LSCALE), where we fully utilize the representation power in both labelled and unlabelled nodes. Specifically, to select nodes for labelling, our framework uses the K-Medoids clustering algorithm on a latent space based on a dynamic combination of both unsupervised features and supervised features. In addition, we design an incremental clustering module to avoid redundancy between nodes selected at different steps. Extensive experiments on five datasets show that our proposed framework LSCALE consistently and significantly outperforms the stateof-the-art approaches by a large margin.

preprint2022arXiv

Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs

Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers. These solvers are computationally expensive, requiring the use of tiny step sizes for numerical stability and accuracy guarantees. This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators. The proximal implicit solver consists of inner-outer iterations: the inner iterations approximate each implicit update step using a fast optimization algorithm, and the outer iterations solve the ODE system over time. The proximal implicit ODE solver guarantees superiority over explicit solvers in numerical stability and computational efficiency. We validate the advantages of proximal implicit solvers over existing popular neural ODE solvers on various challenging benchmark tasks, including learning continuous-depth graph neural networks and continuous normalizing flows.

preprint2022arXiv

Should We Rely on Entity Mentions for Relation Extraction? Debiasing Relation Extraction with Counterfactual Analysis

Recent literature focuses on utilizing the entity information in the sentence-level relation extraction (RE), but this risks leaking superficial and spurious clues of relations. As a result, RE still suffers from unintended entity bias, i.e., the spurious correlation between entity mentions (names) and relations. Entity bias can mislead the RE models to extract the relations that do not exist in the text. To combat this issue, some previous work masks the entity mentions to prevent the RE models from overfitting entity mentions. However, this strategy degrades the RE performance because it loses the semantic information of entities. In this paper, we propose the CORE (Counterfactual Analysis based Relation Extraction) debiasing method that guides the RE models to focus on the main effects of textual context without losing the entity information. We first construct a causal graph for RE, which models the dependencies between variables in RE models. Then, we propose to conduct counterfactual analysis on our causal graph to distill and mitigate the entity bias, that captures the causal effects of specific entity mentions in each instance. Note that our CORE method is model-agnostic to debias existing RE systems during inference without changing their training processes. Extensive experimental results demonstrate that our CORE yields significant gains on both effectiveness and generalization for RE. The source code is provided at: https://github.com/vanoracai/CoRE.

preprint2021arXiv

On a Reversible Gray-Scott Type System from Energetic Variational Approach and Its Irreversible Limit

Most of the previous studies on the well-known Gray-Scott model view it as an irreversible chemical reaction system. In this paper, we derive a four-species reaction-diffusion system using the energetic variational approach based on the law of mass action. This is a reversible Gray-Scott type model, which has a natural entropy structure. We establish the local well-posedness of this system, and justify the limit to the corresponding irreversible Gray-Scott type system as some backward coefficients tend to zero. Furthermore, under some smallness assumption on the initial data, we obtain the global-in-time existence of classical solutions of the reversible system.

preprint2020arXiv

A Variational Lagrangian Scheme for a Phase Field Model: A Discrete Energetic Variational Approach

In this paper, we propose a variational Lagrangian scheme for a modified phase-field model, which can compute the equilibrium states for the original Allen-Cahn type model. Our discretization is based on a prescribed energy-dissipation law in terms of the flow map. By employing a discrete energetic variational approach, this scheme preserves the variational structure of the original energy-dissipation law and is energy stable. Plentiful numerical tests show that, by choosing the initial value properly, our methods can produce the desired equilibrium and capture the thin diffuse interface with a small number of mesh points.

preprint2020arXiv

Dynamic tuning of the director field in liquid crystal shells using block copolymers

When a nematic liquid crystal (LC) is confined on a self-closing spherical shell, topological constraints arise with intriguing consequences that depend critically on how the LC is aligned in the shell. We demonstrate reversible dynamic tuning of the alignment, and thereby the topology, of nematic LC shells stabilized by the nonionic amphiphilic block copolymer Pluronic F127. Deep in the nematic phase, the director is tangential to the interface, but upon approaching the temperature TNI of the nematic-isotropic transition, the director realigns to normal. We link this to a delicate interplay between an interfacial tension that is nearly independent of director orientation, and the configuration-dependent elastic deformation energy of an LC confined in a shell. The process is primarily triggered by the heating-induced reduction of the nematic order parameter, hence realignment temperatures differ by several tens of degrees between LCs with high and low TNI, respectively. The temperature of realignment is always lower on the positive-curved shell outside than at the negative-curved inside, yielding a complex topological reconfiguration on heating. Complementing experimental investigations with mathematical modeling and computer simulations, we identify and investigate three different trajectories, distinguished by their configurations of topological defects in the initial tangential-aligned shell. Our results uncover a new aspect of the complex response of LCs to curved confinement, demonstrating that the order of the LC itself can influence the alignment and thereby the topology of the system. They also reveal the potential of amphiphilic block copolymer stabilizers for enabling continuous tunability of LC shell configuration, opening doors for in-depth studies of topological dynamics as well as novel applications in, e.g., sensing and programmed soft actuators.

preprint2020arXiv

Field Theory of Reaction-Diffusion: Mass Action with an Energetic Variational Approach

We extend the energetic variational approach so it can be applied to a chemical reaction system with general mass action kinetics. Our approach starts with an energy-dissipation law. We show that the chemical equilibrium is determined by the choice of the free energy and the dynamics of the chemical reaction is determined by the choice of the dissipation. This approach enables us to couple chemical reactions with other effects, such as diffusion and drift in an electric field. As an illustration, we apply our approach to a non-equilibrium reaction-diffusion system in a specific but canonical set-up. We show by numerical simulation that the input-output relation of such a system depends on the choice of the dissipation.

preprint2020arXiv

GraphCrop: Subgraph Cropping for Graph Classification

We present a new method to regularize graph neural networks (GNNs) for better generalization in graph classification. Observing that the omission of sub-structures does not necessarily change the class label of the whole graph, we develop the \textbf{GraphCrop} (Subgraph Cropping) data augmentation method to simulate the real-world noise of sub-structure omission. In principle, GraphCrop utilizes a node-centric strategy to crop a contiguous subgraph from the original graph while maintaining its connectivity. By preserving the valid structure contexts for graph classification, we encourage GNNs to understand the content of graph structures in a global sense, rather than rely on a few key nodes or edges, which may not always be present. GraphCrop is parameter learning free and easy to implement within existing GNN-based graph classifiers. Qualitatively, GraphCrop expands the existing training set by generating novel and informative augmented graphs, which retain the original graph labels in most cases. Quantitatively, GraphCrop yields significant and consistent gains on multiple standard datasets, and thus enhances the popular GNNs to outperform the baseline methods.

preprint2020arXiv

On Lagrangian schemes for porous medium type generalized diffusion equations: a discrete energetic variational approach

In this paper, we present a systematic framework to derive a Lagrangian scheme for porous medium type generalized diffusion equations by employing a discrete energetic variational approach. Such discrete energetic variational approaches are analogous to energetic variational approaches in a semidiscrete level, which provide a basis of deriving the "semi-discrete equations" and can be applied to a large class of partial differential equations with energy-dissipation laws and kinematic relations. The numerical schemes derived by this framework can inherit various properties from the continuous energy-dissipation law, such as conservation of mass and the dissipation of the discrete energy. As an illustration, we develop two numerical schemes for the multidimensional porous medium equations (PME), based on two different energy-dissipation laws. We focus on the numerical scheme based on the energy-dissipation law with $\frac{1}{2} \int_Ω |\mathbf{u}|^2 \mathrm{d} \mathbf{x}$ as the dissipation. Several numerical experiments demonstrate the accuracy of this scheme as well as its ability in capturing the free boundary and estimating the waiting time for the PME in both 1D and 2D.

preprint2020arXiv

Revisiting Convolutional Neural Networks for Citywide Crowd Flow Analytics

Citywide crowd flow analytics is of great importance to smart city efforts. It aims to model the crowd flow (e.g., inflow and outflow) of each region in a city based on historical observations. Nowadays, Convolutional Neural Networks (CNNs) have been widely adopted in raster-based crowd flow analytics by virtue of their capability in capturing spatial dependencies. After revisiting CNN-based methods for different analytics tasks, we expose two common critical drawbacks in the existing uses: 1) inefficiency in learning global spatial dependencies, and 2) overlooking latent region functions. To tackle these challenges, in this paper we present a novel framework entitled DeepLGR that can be easily generalized to address various citywide crowd flow analytics problems. This framework consists of three parts: 1) a local feature extraction module to learn representations for each region; 2) a global context module to extract global contextual priors and upsample them to generate the global features; and 3) a region-specific predictor based on tensor decomposition to provide customized predictions for each region, which is very parameter-efficient compared to previous methods. Extensive experiments on two typical crowd flow analytics tasks demonstrate the effectiveness, stability, and generality of our framework.

preprint2019arXiv

Tailored Morphologies in 2D Ferronematic Wells

We focus on a dilute uniform suspension of magnetic nanoparticles in a nematic-filled micron-sized shallow well with tangent boundary conditions, as a paradigm system with two coupled order parameters. This system exhibits spontaneous magnetization without magnetic fields. We numerically obtain the stable nematic and associated magnetization morphologies, induced purely by the geometry, boundary conditions and the coupling between the magnetic nanoparticles and the host nematic medium. Our most striking observations pertain to domain walls in the magnetization profile whose location can be manipulated by the coupling and material properties, and stable interior and boundary nematic defects, whose location and multiplicity can be tailored by the coupling too. These novel morphologies are not accessible in uncoupled systems and can be used for new multistable systems with singularities and stable interfaces.

preprint2019arXiv

The Well Order Reconstruction Solution for Three-Dimensional Wells, in the Landau-de Gennes theory

We study nematic equilibria on three-dimensional square wells, with emphasis on Well Order Reconstruction Solutions (WORS) as a function of the well size, characterized by $λ$, and the well height denoted by $ε$. The WORS are distinctive equilibria reported in [10] for square domains, without taking the third dimension into account, which have two mutually perpendicular defect lines running along the square diagonals, intersecting at the square centre. We prove the existence of WORS on three-dimensional wells for arbitrary well heights, with (i) natural boundary conditions and (ii) realistic surface energies on the top and bottom well surfaces, along with Dirichlet conditions on the lateral surfaces. Moreover, the WORS is globally stable for $λ$ small enough in both cases and unstable as $λ$ increases. We numerically compute novel mixed 3D solutions for large $λ$ and $ε$ followed by a numerical investigation of the effects of surface anchoring on the WORS, exemplifying the relevance of the WORS solution in a 3D context.