Source author record

Yu Zhou

Yu Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence eess.SY gr-qc hep-th Information Retrieval Machine Learning math.OC quant-ph Systems and Control

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Discrete Homogeneity and Quantizer Design for Nonlinear Homogeneous Control Systems

This paper proposes a framework for analysis of generalized homogeneous control systems under state quantization. In particular, it addresses the challenge of maintaining finite/fixed-time stability of nonlinear systems in the presence of quantized measurements. To analyze the behavior of quantized control system, we introduce a new type of discrete homogeneity, where the dilation is defined by a discrete group. The converse Lyapunov function theorem is established for homogeneous systems with respect to discrete dilations. By extending the notion of sector-boundedness to a homogeneous vector space, we derive a generalized homogeneous sector-boundedness condition that guarantees finite/fixed-time stability of nonlinear control system under quantized measurements. A geometry-aware homogeneous static vector quantizer is then designed using generalized homogeneous coordinates, enabling an efficient quantization scheme. The resulting homogeneous control system with the proposed quantizer is proven to be homogeneous with respect to discrete dilation and globally finite-time, nearly fixed-time, or exponentially stable, depending on the homogeneity degree. Numerical examples validate the effectiveness of the proposed approach.

preprint2026arXiv

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely on expensive multiple rollouts and suffer from sparse signals on hard samples. These limitations make on-policy self-distillation (OPSD), which provides dense token-level supervision from a single rollout, a promising alternative. However, its applicability to GUI grounding remains unexplored. In this paper, we present GUI-SD, the first OPSD framework tailored for GUI grounding. First, it constructs a visually enriched privileged context for the teacher using a target bounding box and a Gaussian soft mask, providing informative guidance without leaking exact coordinates. Second, it employs entropy-guided distillation, which adaptively weights tokens based on digit significance and teacher confidence, concentrating optimization on the most impactful and reliable positions. Extensive experiments on six representative GUI grounding benchmarks show that GUI-SD consistently outperforms GRPO-based methods and naive OPSD in both accuracy and training efficiency. Code and training data are available at https://zhangyan-ucas.github.io/GUI-SD/.

preprint2026arXiv

Masked Next-Scale Prediction for Self-supervised Scene Text Recognition

Scene Text Recognition requires modeling visual structures that evolve from coarse layouts to fine-grained character strokes. Training such models relies on large amounts of annotated data. Recent self-supervised approaches, such as Masked Image Modeling (MIM), alleviate this dependency by leveraging large-scale unlabeled data. Yet most existing MIM methods operate at a single spatial scale and fail to capture the hierarchical nature of scene text. In this work, we introduce Masked Next-Scale Prediction (MNSP), a unified self-supervised framework designed to explicitly model cross-scale structural evolution. The framework incorporates Next-Scale Prediction (NSP), which learns hierarchical representations by predicting higher-resolution features from lower-resolution contexts. Naive scale prediction, however, tends to produce spatially diffuse attention, directing the model toward background regions rather than textual structures. MNSP resolves this limitation by jointly learning cross-scale prediction and masked image reconstruction. NSP captures global layout priors across resolutions, while masked reconstruction imposes strong local constraints that guide attention toward informative text regions. A Multi-scale Linguistic Alignment module further maintains semantic consistency across different resolutions. Extensive experiments demonstrate that MNSP achieves state-of-the-art performance, reaching 86.2\% average accuracy on the challenging Union14M benchmark and 96.7\% across six standard datasets. Additional analyses show that our method improves robustness under extreme scale and layout variations. Code is available at https://github.com/CzhczhcHczh/MNSP

preprint2026arXiv

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

Generative Recommendation has emerged as a promising paradigm, reformulating recommendation as a sequence-to-sequence generation task over hierarchical Semantic IDs. However, existing methods suffer from a critical issue we term Semantic Drift, where errors in early, high-level tokens irreversibly divert the generation trajectory into irrelevant semantic subspaces. Inspired by Process Reward Models (PRMs) that enhance reasoning in Large Language Models, we propose Promise, a novel framework that integrates dense, step-by-step verification into generative models. Promise features a lightweight PRM to assess the quality of intermediate inference steps, coupled with a PRM-guided Beam Search strategy that leverages dense feedback to dynamically prune erroneous branches. Crucially, our approach unlocks Test-Time Scaling Laws for recommender systems: by increasing inference compute, smaller models can match or surpass larger models. Extensive offline experiments and online A/B tests on a large-scale platform demonstrate that Promise effectively mitigates Semantic Drift, significantly improving recommendation accuracy while enabling efficient deployment.

preprint2026arXiv

Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver

Heavy-Encoder-Light-Decoder (HELD) neural routing solvers have emerged as a promising paradigm due to their broad applicability across multiple vehicle routing problems (VRPs). However, they typically struggle with VRP variants with complex constraints. To address this limitation, this paper systematically revisits existing neural solvers from the perspective of the generation mechanism for state embeddings (i.e., query vector prior to compatibility calculation) during decoding. We identify that current mechanisms restrict the observation space during attention computation, introducing a key bottleneck to achieving high-quality solutions. Through detailed empirical analysis, we demonstrate the necessity of preserving a global observation space. To overcome the constraint-agnostic drawback inherent to global observation spaces, we propose a simple yet powerful Constraint-Aware Residual Modulation (CARM) module. By adaptively modulating the context embedding with constraint-relevant variables, CARM effectively enhances constraint awareness, enabling the neural solver to fully leverage the global observation space and generate an efficient state embedding. Extensive experimental results across two single-task and five multi-task neural routing solvers confirm that the CARM module consistently boosts baseline performance. Notably, solvers equipped with our CARM achieve substantial improvements in scaling to large-scale instances and in generalizing to unseen VRP variants. These findings provide valuable insights for the architectural design of neural routing solvers.

preprint2026arXiv

Reversing Heat Flow by Coherence in a Multipartite Quantum System

The second law of thermodynamics dictates that heat flows spontaneously from a high-temperature entity to a lower-temperature one. Yet, recent advances have demonstrated that quantum correlations between a system and its thermal environment can induce a reversal of heat flow, challenging classical thermodynamic expectations. Here, we experimentally demonstrate that internal quantum coherence in a multipartite spin system can also reverse heat flow, without relying on initial correlations with the environment. Under the collision model with cascade interaction, we verify that both the strength and the phase of the coherence term determine the direction and magnitude of energy transfer. These results enable precise control of heat flow using only local quantum properties.

preprint2026arXiv

STEP3-VL-10B Technical Report

We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language synergy; and second, a scaled post-training pipeline featuring over 1k iterations of reinforcement learning. Crucially, we implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning that explores and synthesizes diverse visual hypotheses. Consequently, despite its compact 10B footprint, STEP3-VL-10B rivals or surpasses models 10$\times$-20$\times$ larger (e.g., GLM-4.6V-106B, Qwen3-VL-235B) and top-tier proprietary flagships like Gemini 2.5 Pro and Seed-1.5-VL. Delivering best-in-class performance, it records 92.2% on MMBench and 80.11% on MMMU, while excelling in complex reasoning with 94.43% on AIME2025 and 75.95% on MathVision. We release the full model suite to provide the community with a powerful, efficient, and reproducible baseline.

preprint2025arXiv

On the Imaginary Part of the Effective Action in de Sitter Spacetime with Different Regularization Schemes

The imaginary part of the effective action encodes vacuum instability and particle production in the background field. Two standard approaches are commonly used to derive it: the Bogoliubov method and the Green's function method, which are usually expected to agree. However, in de Sitter spacetime they yield different results. We revisit this problem by introducing explicit time and momentum cutoffs in the Green's function representation of the effective action. The apparent discrepancy is found to be due to the different limiting procedures in regularization, which reproduces the Bogoliubov result and the Green's function result respectively. Therefore, the two approaches are understood to be different regularization limits of the same expression, which clarifies the origin of their disagreement.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint