Source author record

Zhiwen Lin

Zhiwen Lin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision quant-ph

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Exponential Analysis for Entanglement Distillation

Historically, the focus in entanglement distillation has predominantly been on the distillable entanglement, and the framework assumes complete knowledge of the initial state. In this paper, we study the reliability function of entanglement distillation, which specifies the optimal exponent of the decay of the distillation error when the distillation rate is below the distillable entanglement. Furthermore, to capture greater operational significance, we extend the framework from the standard setting of known states to a black-box setting, where distillation is performed from a set of possible states. We establish an exact finite blocklength result connecting to composite correlated hypothesis testing without any redundant correction terms. Based on this, the reliability function of entanglement distillation is characterized by the regularized quantum Hoeffding divergence. In the special case of a pure initial state, our result reduces to the error exponent for entanglement concentration derived by Hayashi et al. in 2003. Given full prior knowledge of the state, we construct a concrete optimal distillation protocol. Additionally, we analyze the strong converse exponent of entanglement distillation. While all the above results assume the free operations to be non-entangling, we also investigate other free operation classes, including PPT-preserving, dually non-entangling, and dually PPT-preserving operations.

preprint2026arXiv

GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding

Interpreting ultra-high-resolution (UHR) remote sensing images requires models to search for sparse and tiny visual evidence across large-scale scenes. Existing remote sensing vision-language models can inspect local regions with zooming and cropping tools, but most exploration strategies follow either a one-shot focus or a single sequential trajectory. Such single-path exploration can lose global context, leave scattered regions unvisited, and revisit or count the same evidence multiple times. To this end, we propose GeoVista, a planning-driven active perception framework for UHR remote sensing interpretation. Instead of committing to one zooming path, GeoVista first builds a global exploration plan, then verifies multiple candidate regions through branch-wise local inspection, while maintaining an explicit evidence state for cross-region aggregation and de-duplication. To enable this behavior, we introduce APEX-GRO, a cold-start supervised trajectory corpus that reformulates diverse UHR tasks as Global-Region-Object interactive reasoning processes with a unified, scale-invariant spatial representation. We further design an Observe-Plan-Track mechanism for global observation, adaptive region inspection, and evidence tracking, and align the model with a GRPO-based strategy using step-wise rewards for planning, localization, and final answer correctness. Experiments on RSHR-Bench, XLRS-Bench, and LRS-VQA show that GeoVista achieves state-of-the-art performance. Code and dataset are available at https://github.com/ryan6073/GeoVista

preprint2026arXiv

SkyNative: A Native Multimodal Framework for Remote Sensing Visual Evidence Reasoning

Remote sensing vision-language models commonly rely on pretrained visual encoders to convert images into semantic features before language-model reasoning. While effective for scene-level understanding, this pipeline may prematurely compress local visual evidence, making fine-grained spatial reasoning vulnerable to language priors, especially in ultra-high-resolution remote sensing imagery. We present SkyNative, a native multimodal framework for remote sensing that adopts an encoder-free architecture, removing the pretrained visual backbone to directly represent images as raw patch tokens in the language-model token space. To reconcile low-level visual patches with textual tokens, SkyNative introduces a modality-aware decoupling mechanism that uses modality-specific parameters within a unified autoregressive backbone. We further introduce a visual reliance benchmark that diagnoses whether models ground their answers in image evidence through progressive visual degradation and misleading textual prompts. Across standard remote sensing understanding tasks and large-format spatial reasoning evaluations, SkyNative shows stronger image-grounded perception and improved robustness against prompt-induced language priors. These results suggest that native patch-level multimodal modeling is a promising direction for reliable remote sensing vision-language reasoning.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint