Source author record

Xiao Yang

Xiao Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision astro-ph.SR Artificial Intelligence Machine Learning astro-ph.IM Computation and Language Cryptography and Security cond-mat.mtrl-sci cond-mat.str-el cond-mat.supr-con Information Theory math.IT math.NA Methodology physics.app-ph q-fin.GN Robotics

Catalog footprint

What is connected

36works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AdaFocus: Adaptive Relevance-Diversity Sampling with Zero-Cache Look-back for Efficient Long Video Understanding

Long video understanding is heavily bottlenecked by a rigid one-shot paradigm: existing methods either densely encode videos at prohibitive memory and latency costs, or aggressively compress them into sparse frame sets that irreversibly discard fine-grained evidence needed for downstream reasoning. Consequently, current models struggle to simultaneously balance temporal coverage, visual details, and computational efficiency. We propose AdaFocus, an efficient framework that rethinks long-video understanding as progressive evidence acquisition rather than one-pass encoding. AdaFocus relies on two tightly coupled components. First, a Query-Aware Adaptive Relevance-Diversity sampler (AdaRD) produces a compact yet informative video preview, adaptively switching to global clustering when the query lacks reliable local grounding. Second, instead of caching exhaustive frame sequences in memory, AdaFocus introduces an uncertainty-triggered refinement mechanism. It performs targeted look-back only when the model is not confident, retrieving high-resolution evidence directly from disk via a zero-cache I/O design. This turns discarded visual details from an irreversible loss into on-demand recoverable evidence without paying the cost of exhaustive preloading. Experiments on seven standard long-video benchmarks show that AdaFocus delivers a substantially better efficiency-accuracy trade-off than strong baselines. Compared with conventional dense encoding, AdaFocus achieves improved task performance (e.g., +2.59 accuracy on VideoMME, +8.39 mIoU on Charades-STA over single-pass inference) while reducing visual token consumption by ~33x and eliminating the need for in-memory frame pre-caching through its zero-cache disk retrieval design. These findings suggest that progressive preview combined with zero-cache evidence refinement is a highly effective paradigm for scalable multimedia reasoning.

preprint2026arXiv

AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents

As LLM-based agents operate over sequential multi-step reasoning, hallucinations arising at intermediate steps risk propagating along the trajectory, thus degrading overall reliability. Unlike hallucination detection in single-turn responses, diagnosing hallucinations in multi-step workflows requires identifying which step causes the initial divergence. To fill this gap, we propose a new research task, automated hallucination attribution of LLM-based agents, aiming to identify the step responsible for the hallucination and explain why. To support this task, we introduce AgentHallu, a comprehensive benchmark with: (1) 693 high-quality trajectories spanning 7 agent frameworks and 5 domains, (2) a hallucination taxonomy organized into 5 categories (Planning, Retrieval, Reasoning, Human-Interaction, and Tool-Use) and 14 sub-categories, and (3) multi-level annotations curated by humans, covering binary labels, hallucination-responsible steps, and causal explanations. We evaluate 13 leading models, and results show the task is challenging even for top-tier models (like GPT-5, Gemini-2.5-Pro). The best-performing model achieves only 41.1\% step localization accuracy, where tool-use hallucinations are the most challenging at just 11.6\%. We believe AgentHallu will catalyze future research into developing robust, transparent, and reliable agentic systems.

preprint2026arXiv

Anatomy-Slot: Unsupervised Anatomical Factorization for Homologous Bilateral Reasoning in Retinal Diagnosis

Retinal diagnosis is inherently bilateral: clinicians compare homologous structures across eyes (e.g., optic disc asymmetry), yet most deep models operate on monocular representations. We investigate whether explicit structural correspondence improves diagnosis, and propose Anatomy-Slot to operationalize this hypothesis. Anatomy-Slot introduces an unsupervised anatomical bottleneck by decomposing patch tokens into slots and aligning slots across eyes via bidirectional cross-attention. On ODIR-5K with $n=10$ seeds, the method improves AUC by 4.2% over a matched ViT-L baseline (95% CIs; Wilcoxon signed-rank test, $W=0$, $p=0.002$). Pairing disruption and stress testing under Gaussian noise provide controlled tests of correspondence dependence and robustness under corruption. We further report quantitative optic disc grounding on REFUGE and cross-attention localization analysis.

preprint2026arXiv

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Real-time interactive video generation requires low-latency, streaming, and controllable rollout. Existing autoregressive (AR) diffusion distillation methods have achieved strong results in the chunk-wise 4-step regime by distilling bidirectional base models into few-step AR students, but they remain limited by coarse response granularity and non-negligible sampling latency. In this paper, we study a more aggressive setting: frame-wise autoregression with only 1--2 sampling steps. In this regime, we identify the initialization of a few-step AR student as the key bottleneck: existing strategies are either target-misaligned, incapable of few-step generation, or too costly to scale. We propose \textbf{Causal Forcing++}, a principled and scalable pipeline that uses \emph{causal consistency distillation} (causal CD) for few-step AR initialization. The core idea is that causal CD learns the same AR-conditional flow map as causal ODE distillation, but obtains supervision from a single online teacher ODE step between adjacent timesteps, avoiding the need to precompute and store full PF-ODE trajectories. This makes the initialization both more efficient and easier to optimize. The resulting pipeline, \ours, surpasses the SOTA 4-step chunk-wise Causal Forcing under the \textit{\textbf{frame-wise 2-step setting}} by 0.1 in VBench Total, 0.3 in VBench Quality, and 0.335 in VisionReward, while reducing first-frame latency by 50\% and Stage 2 training cost by $\sim$$4\times$. We further extend the pipeline to action-conditioned world model generation in the spirit of Genie3. Project Page: https://github.com/thu-ml/Causal-Forcing and https://github.com/shengshu-ai/minWM .

preprint2026arXiv

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

The task of Image-to-Video (I2V) generation aims to synthesize a video from a reference image and a text prompt. This requires diffusion models to reconcile high-frequency visual constraints and low-frequency textual guidance during the denoising process. However, while existing I2V models prioritize visual consistency, how to effectively couple this dual guidance to ensure strong adherence to the text prompt remains underexplored. In this work, we observe that in Diffusion Transformer (DiT)-based I2V models, certain intermediate layers exhibit weak semantic responses (termed Semantic-Weak Layers), as indicated by a measurable drop in text-visual similarity. We attribute this to a phenomenon called Condition Isolation, where attention to visual features becomes partially detached from text guidance and overly relies on learned visual priors. To address this, we propose Focal Guidance (FG), which enhances the controllability from Semantic-Weak Layers. FG comprises two mechanisms: (1) Fine-grained Semantic Guidance (FSG) leverages CLIP to identify key regions in the reference frame and uses them as anchors to guide Semantic-Weak Layers. (2) Attention Cache transfers attention maps from semantically responsive layers to Semantic-Weak Layers, injecting explicit semantic signals and alleviating their over-reliance on the model's learned visual priors, thereby enhancing adherence to textual instructions. To further validate our approach and address the lack of evaluation in this direction, we introduce a benchmark for assessing instruction following in I2V models. On this benchmark, Focal Guidance proves its effectiveness and generalizability, raising the total score on Wan2.1-I2V to 0.7250 (+3.97\%) and boosting the MMDiT-based HunyuanVideo-I2V to 0.5571 (+7.44\%).

preprint2026arXiv

GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding

Interpreting ultra-high-resolution (UHR) remote sensing images requires models to search for sparse and tiny visual evidence across large-scale scenes. Existing remote sensing vision-language models can inspect local regions with zooming and cropping tools, but most exploration strategies follow either a one-shot focus or a single sequential trajectory. Such single-path exploration can lose global context, leave scattered regions unvisited, and revisit or count the same evidence multiple times. To this end, we propose GeoVista, a planning-driven active perception framework for UHR remote sensing interpretation. Instead of committing to one zooming path, GeoVista first builds a global exploration plan, then verifies multiple candidate regions through branch-wise local inspection, while maintaining an explicit evidence state for cross-region aggregation and de-duplication. To enable this behavior, we introduce APEX-GRO, a cold-start supervised trajectory corpus that reformulates diverse UHR tasks as Global-Region-Object interactive reasoning processes with a unified, scale-invariant spatial representation. We further design an Observe-Plan-Track mechanism for global observation, adaptive region inspection, and evidence tracking, and align the model with a GRPO-based strategy using step-wise rewards for planning, localization, and final answer correctness. Experiments on RSHR-Bench, XLRS-Bench, and LRS-VQA show that GeoVista achieves state-of-the-art performance. Code and dataset are available at https://github.com/ryan6073/GeoVista

preprint2026arXiv

MILD: Mediator Agent System with Bidirectional Perception and Multi-Layered Alignment for Human-Vehicle Collaboration

Prior studies report that partial driving automation can increase the cognitive demands on human drivers. This effect largely arises from human drivers' lack of transparent insight into the vehicle's intentions and decision logic, as well as from automated systems' limited awareness of the driver's dynamic state and preferences. This bidirectional misalignment undermines shared situational awareness and exacerbates coordination failures in human-vehicle interaction. To address these limitations, we argue for a paradigm shift that elevates the human role from passive supervisor to active manager. We introduce the Mediator-in-the-Loop-Driving (MILD) system, based on an agentic system architecture to facilitate synergistic human-vehicle collaboration. MILD integrates a perception agent for joint in-cabin and out-of-cabin understanding with a lightweight strategy agent that generates compliant and explainable action suggestions. To ensure these strategies are strictly aligned with safety regulations and human values, we develop Evidence- and Constraint-weighted Policy Optimization (ECPO). ECPO leverages automatic validators to steer the agent toward behaviors that are not only accurate but also structurally complete, substantiated by evidence, and free from constraint violations. Furthermore, a retrieval-augmented generation module dynamically incorporates constraints from traffic regulations, speed recommendations, and driver preferences into the decision loop. Field experiments across three open datasets demonstrate that MILD consistently outperforms baselines in both perception accuracy and strategy quality under auditable offline metrics, and yields higher human-rated policy adequacy, comfort, and explanation than baselines. This work offers a practical pathway for building auditable and aligned agents for human-vehicle collaborative driving.

preprint2026arXiv

ReflexDiffusion: Reflection-Enhanced Trajectory Planning for High-lateral-acceleration Scenarios in Autonomous Driving

Generating safe and reliable trajectories for autonomous vehicles in long-tail scenarios remains a significant challenge, particularly for high-lateral-acceleration maneuvers such as sharp turns, which represent critical safety situations. Existing trajectory planners exhibit systematic failures in these scenarios due to data imbalance. This results in insufficient modelling of vehicle dynamics, road geometry, and environmental constraints in high-risk situations, leading to suboptimal or unsafe trajectory prediction when vehicles operate near their physical limits. In this paper, we introduce ReflexDiffusion, a novel inference-stage framework that enhances diffusion-based trajectory planners through reflective adjustment. Our method introduces a gradient-based adjustment mechanism during the iterative denoising process: after each standard trajectory update, we compute the gradient between the conditional and unconditional noise predictions to explicitly amplify critical conditioning signals, including road curvature and lateral vehicle dynamics. This amplification enforces strict adherence to physical constraints, particularly improving stability during high-lateral-acceleration maneuvers where precise vehicle-road interaction is paramount. Evaluated on the nuPlan Test14-hard benchmark, ReflexDiffusion achieves a 14.1% improvement in driving score for high-lateral-acceleration scenarios over the state-of-the-art (SOTA) methods. This demonstrates that inference-time trajectory optimization can effectively compensate for training data sparsity by dynamically reinforcing safety-critical constraints near handling limits. The framework's architecture-agnostic design enables direct deployment to existing diffusion-based planners, offering a practical solution for improving autonomous vehicle safety in challenging driving conditions.

preprint2026arXiv

SkyNative: A Native Multimodal Framework for Remote Sensing Visual Evidence Reasoning

Remote sensing vision-language models commonly rely on pretrained visual encoders to convert images into semantic features before language-model reasoning. While effective for scene-level understanding, this pipeline may prematurely compress local visual evidence, making fine-grained spatial reasoning vulnerable to language priors, especially in ultra-high-resolution remote sensing imagery. We present SkyNative, a native multimodal framework for remote sensing that adopts an encoder-free architecture, removing the pretrained visual backbone to directly represent images as raw patch tokens in the language-model token space. To reconcile low-level visual patches with textual tokens, SkyNative introduces a modality-aware decoupling mechanism that uses modality-specific parameters within a unified autoregressive backbone. We further introduce a visual reliance benchmark that diagnoses whether models ground their answers in image evidence through progressive visual degradation and misleading textual prompts. Across standard remote sensing understanding tasks and large-format spatial reasoning evaluations, SkyNative shows stronger image-grounded perception and improved robustness against prompt-induced language priors. These results suggest that native patch-level multimodal modeling is a promising direction for reliable remote sensing vision-language reasoning.

preprint2026arXiv

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Recent progress in text-to-image (T2I) diffusion models (DMs) has enabled high-quality visual synthesis from diverse textual prompts. Yet, most existing T2I DMs, even those equipped with large language model (LLM)-based text encoders, remain text-pixel mappers -- they employ LLMs merely as text encoders, without leveraging their inherent reasoning capabilities to infer what should be visually depicted given the textual prompt. To move beyond such literal generation, we propose the think-then-generate (T2G) paradigm, where the LLM-based text encoder is encouraged to reason about and rewrite raw user prompts; the states of the rewritten prompts then serve as diffusion conditioning. To achieve this, we first activate the think-then-rewrite pattern of the LLM encoder with a lightweight supervised fine-tuning process. Subsequently, the LLM encoder and diffusion backbone are co-optimized to ensure faithful reasoning about the context and accurate rendering of the semantics via Dual-GRPO. In particular, the text encoder is reinforced using image-grounded rewards to infer and recall world knowledge, while the diffusion backbone is pushed to produce semantically consistent and visually coherent images. Experiments show substantial improvements in factual consistency, semantic alignment, and visual realism across reasoning-based image generation and editing benchmarks, achieving 0.79 on WISE score, nearly on par with GPT-4. Our results constitute a promising step toward next-generation unified models with reasoning, expression, and demonstration capacities.

preprint2026arXiv

U-HNO: A U-shaped Hybrid Neural Operator with Sparse-Point Adaptive Routing for Non-stationary PDE Dynamics

Solutions to many partial differential equations (PDEs) display coexisting smooth global transport and localized sharp features within a single trajectory: shock fronts, thin interfaces, and concentrated high-frequency content sit on top of slowly varying backgrounds. This poses a challenge for neural operators: Fourier-based architectures mix nonlocal interactions efficiently but tend to under-resolve localized non-smooth features, whereas spatially local architectures recover fine detail at the cost of long-range propagation and rollout stability. Existing hybrid operators paper over this tension with a fixed, spatially uniform fusion that forces the same trade-off everywhere. We propose U-HNO, a U-shaped hybrid neural operator whose central design is Sparse-Point Adaptive Routing (SPAR): at every spatial location, a per-pixel hard mask selects whether the global Fourier branch or the local multi-scale Gaussian branch should dominate, and the sparsity ratio is a function of the local contrast of the routing signal, so smooth and shock-aligned regions receive different mixtures of global and local computation. SPAR is embedded in a hierarchical encoder-bottleneck-decoder backbone with skip connections so that the dual branches and the gate operate at every resolution. Training combines pointwise supervision with a finite-difference H^1 gradient term and a band-wise spectral consistency regularizer. Across benchmarks spanning 1D Burgers, Kuramoto-Sivashinsky, KdV, 2D advection, Allen-Cahn, Navier-Stokes, Darcy flow, and 3D transonic compressible Navier-Stokes from PDEBench, U-HNO achieves state-of-the-art rollout accuracy on the majority of tasks in both relative L^2 and H^1 metrics, with the largest gains on problems dominated by sharp localized features. Ablations show that removing any single component substantially degrades rollout error.

preprint2026arXiv

ZAYA1-8B Technical Report

We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-weight reasoning models. ZAYA1-8B was trained from scratch for reasoning, with reasoning data included from pretraining onward using an answer-preserving trimming scheme. Post-training uses a four-stage RL cascade: reasoning warmup on math and puzzles; a 400-task RLVE-Gym curriculum; math and code RL with test-time compute traces and synthetic code environments built from competitive-programming references; and behavioral RL for chat and instruction following. We also introduce Markovian RSA, a test-time compute method that recursively aggregates parallel reasoning traces while carrying forward only bounded-length reasoning tails between rounds. In TTC evaluation, Markovian RSA raises ZAYA1-8B to 91.9\% on AIME'25 and 89.6\% on HMMT'25 while carrying forward only a 4K-token tail, narrowing the gap to much larger reasoning models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High.

preprint2022arXiv

Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks

Transfer-based adversarial attacks can evaluate model robustness in the black-box setting. Several methods have demonstrated impressive untargeted transferability, however, it is still challenging to efficiently produce targeted transferability. To this end, we develop a simple yet effective framework to craft targeted transfer-based adversarial examples, applying a hierarchical generative network. In particular, we contribute to amortized designs that well adapt to multi-class targeted attacks. Extensive experiments on ImageNet show that our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods -- it reaches an average success rate of 29.1\% against six diverse models based only on one substitute white-box model, which significantly outperforms the state-of-the-art gradient-based attack methods. Moreover, the proposed method is also more efficient beyond an order of magnitude than gradient-based methods.

preprint2022arXiv

Comparison of Two Methods for Calculating Magnetic Helicity in the Solar Corona

Duo to the large magnetic Reynolds number, the magnetic helicity originating from the solar interior can be carried away through the photosphere into the corona. However, the relationship between the accumulated magnetic helicity flux through the photosphere and the magnetic helicity in the corona is still unclear. By selecting 36 newly emerging active regions in the 23rd solar cycle, we apply optical flow methods to derive the accumulated magnetic helicity through the photosphere ($H_m^p$) by using the sequential longitudinal magnetograms, use nonlinear force-free field extrapolation to obtain the 3D coronal magnetic field, and adopt finite volume methods to calculate the instantaneous relative magnetic helicity in the corona ($H_m^c$) by using vector magnetograms. It is found that the local correlation tracking (LCT)-based $H_m^p$ is larger than $H_m^c$ in $1"$, and that the Differential Affine Velocity Estimator-based $H_m^p$ is more consistent with $H_m^c$ than the LCT-based $H_m^p$. $H_m^p$ is more consistent with $H_m^c$ in evaluation from $2"$ than from $1"$. Moreover, $H_m^c - H_m^p$ systematically shows consistency with the Hemispheric Helicity Rule (over 55\%), no matter which resolution and method are used. These estimations suggest that the consistency of $H_m^c$ and $H_m^p$ is partly dependent on the resolution of the magnetograms and the calculation methods.

preprint2022arXiv

Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition

Recent studies have revealed the vulnerability of face recognition models against physical adversarial patches, which raises security concerns about the deployed face recognition systems. However, it is still challenging to ensure the reproducibility for most attack algorithms under complex physical conditions, which leads to the lack of a systematic evaluation of the existing methods. It is therefore imperative to develop a framework that can enable a comprehensive evaluation of the vulnerability of face recognition in the physical world. To this end, we propose to simulate the complex transformations of faces in the physical world via 3D-face modeling, which serves as a digital counterpart of physical faces. The generic framework allows us to control different face variations and physical conditions to conduct reproducible evaluations comprehensively. With this digital simulator, we further propose a Face3DAdv method considering the 3D face transformations and realistic physical variations. Extensive experiments validate that Face3DAdv can significantly improve the effectiveness of diverse physically realizable adversarial patches in both simulated and physical environments, against various white-box and black-box face recognition models.

preprint2022arXiv

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7\% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods. Code is available at \url{https://github.com/SlongLiu/DAB-DETR}.

preprint2022arXiv

DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation

In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work. In this paper, we propose a novel method DDG-DA, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data. We conduct experiments on three real-world tasks (forecasting on stock price trend, electricity load and solar irradiance) and obtain significant improvement on multiple widely-used models.

preprint2022arXiv

Exploring Memorization in Adversarial Training

Deep learning models have a propensity for fitting the entire training set even with random labels, which requires memorization of every training sample. In this paper, we explore the memorization effect in adversarial training (AT) for promoting a deeper understanding of model capacity, convergence, generalization, and especially robust overfitting of the adversarially trained models. We first demonstrate that deep networks have sufficient capacity to memorize adversarial examples of training data with completely random labels, but not all AT algorithms can converge under the extreme circumstance. Our study of AT with random labels motivates further analyses on the convergence and generalization of AT. We find that some AT approaches suffer from a gradient instability issue and most recently suggested complexity measures cannot explain robust generalization by considering models trained on random labels. Furthermore, we identify a significant drawback of memorization in AT that it could result in robust overfitting. We then propose a new mitigation algorithm motivated by detailed memorization analyses. Extensive experiments on various datasets validate the effectiveness of the proposed method.

preprint2022arXiv

Observations of pores and surrounding regions with CO 4.66 μm lines by BBSO/CYRA

Solar observations of carbon monoxide (CO) indicate the existence of lower-temperature gas in the lower solar chromosphere. We present an observation of pores, and quiet-Sun, and network magnetic field regions with CO 4.66 μm lines by the Cryogenic Infrared Spectrograph (CYRA) at Big Bear Solar Observatory. We used the strong CO lines at around 4.66 μm to understand the properties of the thermal structures of lower solar atmosphere in different solar features with various magnetic field strengths. AIA 1700 Å images, HMI continuum images and magnetograms are also included in the observation. The data from 3D radiation magnetohydrodynamic (MHD) simulation with the Bifrost code are also employed for the first time to be compared with the observation. We used the RH code to synthesize the CO line profiles in the network regions. The CO 3-2 R14 line center intensity changes to be either enhanced or diminished with increasing magnetic field strength, which should be caused by different heating effects in magnetic flux tubes with different sizes. We find several "cold bubbles" in the CO 3-2 R14 line center intensity images, which can be classified into two types. One type is located in the quiet-Sun regions without magnetic fields. The other type, which has rarely been reported in the past, is near or surrounded by magnetic fields. Notably, some are located at the edge of the magnetic network. The two kinds of cold bubbles and the relationship between cold bubble intensities and network magnetic field strength are both reproduced by the 3D MHD simulation with the Bifrost and RH codes. The simulation also shows that there is a cold plasma blob near the network magnetic fields, causing the observed cold bubbles seen in the CO 3-2 R14 line center image. Our observation and simulation illustrate that the magnetic field plays a vital role in the generation of some CO cold bubbles.

preprint2022arXiv

RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking

RGB-D object tracking has attracted considerable attention recently, achieving promising performance thanks to the symbiosis between visual and depth channels. However, given a limited amount of annotated RGB-D tracking data, most state-of-the-art RGB-D trackers are simple extensions of high-performance RGB-only trackers, without fully exploiting the underlying potential of the depth channel in the offline training stage. To address the dataset deficiency issue, a new RGB-D dataset named RGBD1K is released in this paper. The RGBD1K contains 1,050 sequences with about 2.5M frames in total. To demonstrate the benefits of training on a larger RGB-D data set in general, and RGBD1K in particular, we develop a transformer-based RGB-D tracker, named SPT, as a baseline for future visual object tracking studies using the new dataset. The results, of extensive experiments using the SPT tracker emonstrate the potential of the RGBD1K dataset to improve the performance of RGB-D tracking, inspiring future developments of effective tracker designs. The dataset and codes will be available on the project homepage: https://github.com/xuefeng-zhu5/RGBD1K.

preprint2022arXiv

Robustness and Accuracy Could Be Reconcilable by (Proper) Definition

The trade-off between robustness and accuracy has been widely studied in the adversarial literature. Although still controversial, the prevailing view is that this trade-off is inherent, either empirically or theoretically. Thus, we dig for the origin of this trade-off in adversarial training and find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance -- an overcorrection towards smoothness. Given this, we advocate employing local equivariance to describe the ideal behavior of a robust model, leading to a self-consistent robust error named SCORE. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty via robust optimization. By simply substituting KL divergence with variants of distance metrics, SCORE can be efficiently minimized. Empirically, our models achieve top-rank performance on RobustBench under AutoAttack. Besides, SCORE provides instructive insights for explaining the overfitting phenomenon and semantic input gradients observed on robust models. Code is available at https://github.com/P2333/SCORE.

preprint2022arXiv

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Recent studies have shown that StyleGANs provide promising prior models for downstream tasks on image synthesis and editing. However, since the latent codes of StyleGANs are designed to control global styles, it is hard to achieve a fine-grained control over synthesized images. We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way. The structure and texture of different local parts are controlled by corresponding latent codes. Experimental results demonstrate that our model provides a strong disentanglement between different spatial areas. When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images. The model can also be extended to other domains via transfer learning. Thus, as a generic prior model with built-in disentanglement, it could facilitate the development of GAN-based applications and enable more potential downstream tasks.

preprint2020arXiv

A New Comprehensive Data Set of Solar Filaments of 100 yr Interval. I

Filaments are very common physical phenomena on the Sun and are often taken as important proxies of solar magnetic activities. The study of filaments has become a hot topic in the space weather research. For a more comprehensive understanding of filaments, especially for an understanding of solar activities of multiple solar cycles, it is necessary to perform a combined multifeature analysis by constructing a data set of multiple solar cycle data. To achieve this goal, we constructed a centennial data set that covers the H$α$ data from five observatories around the world. During the data set construction, we encountered varieties of problems, such as data fusion, accurate determination of the solar edge, classifying data by quality, dynamic threshold, and so on, which arose mainly due to multiple sources and a large time span of data. But fortunately, these problems were well solved. The data set includes seven types of data products and eight types of feature parameters with which we can implement the functions of data searching and statistical analyses. It has the characteristics of better continuity and highly complementary to space observation data, especially in the wavelengths not covered by space observations, and covers many solar cycles (including more than 60 yr of high-cadence data). We expect that this new comprehensive data set as well as the tools will help researchers to significantly speed up their search for features or events of interest, for either statistical or case study purposes, and possibly help them get a better and more comprehensive understanding of solar filament mechanisms.

preprint2020arXiv

A Non-Linear Magnetic Field Calibration Method for Filter-Based Magnetographs by Multilayer Perceptron

For filter-based magnetographs, the linear calibration method under the weak-field assumption is usually adopted; this leads to magnetic saturation effect in the regions with strong magnetic field. This article explores a new method to overcome the above disadvantage using a multilayer perceptron network, which we call MagMLP, based on a back-propagation algorithm with one input layer, five hidden layers, and one output layer. We use the data from the \textit{Spectropolarimeter} (SP) on board \textit{Hinode} to simulate single-wavelength observations for the model training, and take into account the influence of the Doppler velocity field and the filling factor. The training results show that the linear fitting coefficient (LFC) of the transverse field reaches above 0.91, and that of the longitudinal field is above 0.98. The generalization of the models is good because the corresponding LFCs are above 0.9 for the test subsets. Compared with the linear calibration method, the MagMLP is much more effective on dealing with the magnetic saturation effect. Analyzing an active region, the results of the linear calibration present an evident magnetic saturation effect in the umbra regions; the corresponding systematic error reaches values greater than 1000 G in most areas, or even exceeds 2000 G at some pixels. However, the results of MagMLP at these locations are very close to the inversion results, and the systematic errors are basically within 300 G. In addition, we find that there are many "bright spots" and "dark spots" on the inclination angle images from the inversion results of \textit{Hinode}/SP with values of 180 and 0 degrees, respectively, where the inversion is not reliable and does not produce a good result; the MagMLP handles these points well.

preprint2020arXiv

A nonlinear solar magnetic field calibration method for the filter-based magnetograph by the residual network

The method of solar magnetic field calibration for the filter-based magnetograph is normally the linear calibration method under weak-field approximation that cannot generate the strong magnetic field region well due to the magnetic saturation effect. We try to provide a new method to carry out the nonlinear magnetic calibration with the help of neural networks to obtain more accurate magnetic fields. We employed the data from Hinode/SP to construct a training, validation and test dataset. The narrow-band Stokes I, Q, U, and V maps at one wavelength point were selected from all the 112 wavelength points observed by SP so as to simulate the single-wavelength observations of the filter-based magnetograph. We used the residual network to model the nonlinear relationship between the Stokes maps and the vector magnetic fields. After an extensive performance analysis, it is found that the trained models could infer the longitudinal magnetic flux density, the transverse magnetic flux density, and the azimuth angle from the narrow-band Stokes maps with a precision comparable to the inversion results using 112 wavelength points. Moreover, the maps that were produced are much cleaner than the inversion results. The method can effectively overcome the magnetic saturation effect and infer the strong magnetic region much better than the linear calibration method. The residual errors of test samples to standard data are mostly about 50 G for both the longitudinal and transverse magnetic flux density. The values are about 100 G with our previous method of multilayer perceptron, indicating that the new method is more accurate in magnetic calibration.

preprint2020arXiv

Design and Interpretation of Universal Adversarial Patches in Face Detection

We consider universal adversarial patches for faces -- small visual elements whose addition to a face image reliably destroys the performance of face detectors. Unlike previous work that mostly focused on the algorithmic design of adversarial examples in terms of improving the success rate as an attacker, in this work we show an interpretation of such patches that can prevent the state-of-the-art face detectors from detecting the real faces. We investigate a phenomenon: patches designed to suppress real face detection appear face-like. This phenomenon holds generally across different initialization, locations, scales of patches, backbones, and state-of-the-art face detection frameworks. We propose new optimization-based approaches to automatic design of universal adversarial patches for varying goals of the attack, including scenarios in which true positives are suppressed without introducing false positives. Our proposed algorithms perform well on real-world datasets, deceiving state-of-the-art face detectors in terms of multiple precision/recall metrics and transferability.

preprint2020arXiv

Infrared diagnostics of the solar magnetic field with Mg I 12 $μ$m lines: forward-model results

The Mg I 12.32 and 12.22 $μ$m lines are a pair of emission lines that present a great advantage for accurate solar magnetic field measurement. They potentially contribute to the diagnosis of solar atmospheric parameters through their high magnetic sensitivity. The goal of this study is to understand the radiation transfer process of these lines in detail and explore the ability of magnetic field diagnosis in the infrared. We calculated the Stokes profiles and response functions of the two Mg I 12 $μ$m lines based on one-dimensional solar atmospheric models using the Rybicki-Hummer (RH) radiative transfer code. The integration of these profiles with respect to the wavelength was used to generate calibration curves related to the longitudinal and transverse fields. The traditional single-wavelength calibration curve based on the weak-field approximation was also tested to determine if it is suitable for the infrared. The 12.32 $μ$m line is more suitable for a magnetic field diagnosis because its relative emission intensity and polarization signal are stronger than that of the 12.22 $μ$m line. The result from the response functions illustrates that the derived magnetic field and velocity with 12.32 $μ$m line mainly originate from the height of 450 km, while that for the temperature is about 490 km. The calibration curves obtained by the wavelength-integrated method show a nonlinear distribution. For the Mg I 12.32 $μ$m line, the longitudinal (transverse) field can be effectively inferred from Stokes V/I (Q/I and U/I) in the linear range below $\sim 600$ G ($\sim 3000$ G) in quiet regions and below $\sim 400$ G ($\sim 1200$ G) in penumbrae. Within the given linear range, the method is a supplement to the magnetic field calibration when the Zeeman components are incompletely split.

preprint2020arXiv

Organized Self-Emulsification toward Structural Color

The formation of water-in-oil-in-water (W/O/W) double emulsions can be well-controlled through an organized self-emulsification mechanism in the presence of rigid bottlebrush amphiphilic block copolymers. Nanoscale water droplets with well-controlled diameters form ordered spatial arrangements within the micron-scale oil droplets. Upon solvent evaporation, solid microspheres with hexagonal close packed nanopore arrays are obtained resulting in bright structural colors. The reflected color is precisely tunable across the whole visible light range through tailoring contour length of the bottlebrush molecule. In-situ observation of the W/O interface using confocal laser scanning microscopy provides insights into the mechanism of the organized self-emulsification. This work provides a powerful strategy for the fabrication of structural colored materials in an easy and scalable manner.

preprint2020arXiv

Propagating Slow Sausage Waves in a Sunspot Observed by the New Vacuum Solar Telescope

A sunspot is an ideal waveguide for a variety of magnetohydrodynamic waves, which carry a significant amount of energy to the upper atmosphere and could be used as a tool to probe magnetic and thermal structure of a sunspot. In this study, we used the New Vacuum Solar Telescope and took high-resolution image sequences simultaneously in both TiO (7058$\pm$10 Å) and H$_α$ (6562$\pm$2.5 Å) bandpasses. We extracted the area and total emission intensity variations of sunspot umbra and analyzed the signals with synchrosqueezing transform. We found that the area and emission intensity varied with both three and five minute periodicity. Moreover, the area and intensity oscillated in phase with each other, this fact hold in both TiO and H$_α$ data. We interpret this oscillatory signal as propagating slow sausage wave. The propagation speed is estimated at about 8 km$\cdot$s$^{-1}$. We infer that this sunspot's umbra could have temperature as low as 2800--3500 K.

preprint2016arXiv

Automatic Recognition of Sunspots in HSOS Full-Disk Solar Images

A procedure is introduced to recognise sunspots automatically in solar full-disk photosphere images obtained from Huairou Solar Observing Station, National Astronomical Observatories of China. The images are first pre-processed through Gaussian algorithm. Sunspots are then recognised by the morphological Bot-hat operation and Otsu threshold. Wrong selection of sunspots is eliminated by a criterion of sunspot properties. Besides, in order to calculate the sunspots areas and the solar centre, the solar limb is extracted by a procedure using morphological closing and erosion operations and setting an adaptive threshold. Results of sunspot recognition reveal that the number of the sunspots detected by our procedure has a quite good agreement with the manual method. The sunspot recognition rate is 95% and error rate is 1.2%. The sunspot areas calculated by our method have high correlation (95%) with the area data from USAF/NOAA.

preprint2016arXiv

Fast Predictive Image Registration

We present a method to predict image deformations based on patch-wise image appearance. Specifically, we design a patch-based deep encoder-decoder network which learns the pixel/voxel-wise mapping between image appearance and registration parameters. Our approach can predict general deformation parameterizations, however, we focus on the large deformation diffeomorphic metric mapping (LDDMM) registration model. By predicting the LDDMM momentum-parameterization we retain the desirable theoretical properties of LDDMM, while reducing computation time by orders of magnitude: combined with patch pruning, we achieve a 1500x/66x speed up compared to GPU-based optimization for 2D/3D image registration. Our approach has better prediction accuracy than predicting deformation or velocity fields and results in diffeomorphic transformations. Additionally, we create a Bayesian probabilistic version of our network, which allows evaluation of deformation field uncertainty through Monte Carlo sampling using dropout at test time. We show that deformation uncertainty highlights areas of ambiguous deformations. We test our method on the OASIS brain image dataset in 2D and 3D.

preprint2016arXiv

Smart Library: Identifying Books in a Library using Richly Supervised Deep Scene Text Reading

Physical library collections are valuable and long standing resources for knowledge and learning. However, managing books in a large bookshelf and finding books on it often leads to tedious manual work, especially for large book collections where books might be missing or misplaced. Recently, deep neural models, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) have achieved great success for scene text detection and recognition. Motivated by these recent successes, we aim to investigate their viability in facilitating book management, a task that introduces further challenges including large amounts of cluttered scene text, distortion, and varied lighting conditions. In this paper, we present a library inventory building and retrieval system based on scene text reading methods. We specifically design our scene text recognition model using rich supervision to accelerate training and achieve state-of-the-art performance on several benchmark datasets. Our proposed system has the potential to greatly reduce the amount of human labor required in managing book inventories as well as the space needed to store book information.

preprint2014arXiv

Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting

This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size $m$. In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal within the class of distributions satisfying a moment condition, and is close to optimal for the class of all i.i.d distributions on strings of a given length. Moreover, the method can be used to code and predict strings with a condition on the tail of the ordered counts. It can also be applied to distributions in an envelope class.

preprint2013arXiv

Magnetic Nonpotentiality in Photospheric Active Regions as a Predictor of Solar Flares

Based on several magnetic nonpotentiality parameters obtained from the vector photospheric active region magnetograms obtained with the Solar Magnetic Field Telescope at the Huairou Solar Observing Station over two solar cycles, a machine learning model has been constructed to predict the occurrence of flares in the corresponding active region within a certain time window. The Support Vector Classifier, a widely used general classifier, is applied to build and test the prediction models. Several classical verification measures are adopted to assess the quality of the predictions. We investigate different flare levels within various time windows, and thus it is possible to estimate the rough classes and erupting times of flares for particular active regions. Several combinations of predictors have been tested in the experiments. The True Skill Statistics are higher than 0.36 in 97% of cases and the Heidke Skill Scores range from 0.23 to 0.48. The predictors derived from longitudinal magnetic fields do perform well, however they are less sensitive in predicting large flares. Employing the nonpotentiality predictors from vector fields improves the performance of predicting large flares of magnitude $\geq$M5.0 and $\geq$X1.0.

preprint2012arXiv

A Statistical Study on Photospheric Magnetic Nonpotentiality of Active Regions and Its Relationship with Flares during Solar Cycles 22-23

A statistical study is carried out on the photospheric magnetic nonpotentiality in solar active regions and its relationship with associated flares. We select 2173 photospheric vector magnetograms from 1106 active regions observed by the Solar Magnetic Field Telescope at Huairou Solar Observing Station, National Astronomical Observatories of China, in the period of 1988-2008, which covers most of the 22nd and 23rd solar cycles. We have computed the mean planar magnetic shear angle (\bar{Δϕ}), mean shear angle of the vector magnetic field (\bar{Δψ}), mean absolute vertical current density (\bar{|J_{z}|}), mean absolute current helicity density (\bar{|h_{c}|}), absolute twist parameter (|α_{av}|), mean free magnetic energy density (\bar{ρ_{free}}), effective distance of the longitudinal magnetic field (d_{E}), and modified effective distance (d_{Em}) of each photospheric vector magnetogram. Parameters \bar{|h_{c}|}, \bar{ρ_{free}}, and d_{Em} show higher correlation with the evolution of the solar cycle. The Pearson linear correlation coefficients between these three parameters and the yearly mean sunspot number are all larger than 0.59. Parameters \bar{Δϕ}, \bar{Δψ}, \bar{|J_{z}|}, |α_{av}|, and d_{E} show only weak correlations with the solar cycle, though the nonpotentiality and the complexity of active regions are greater in the activity maximum periods than in the minimum periods. All of the eight parameters show positive correlations with the flare productivity of active regions, and the combination of different nonpotentiality parameters may be effective in predicting the flaring probability of active regions.

preprint2001arXiv

Electrical and Thermal Transport by Nodal Quasiparticles in the DDW State

We compute the electrical and thermal conductivities and Hall conductivities of the $d$-density wave (DDW) state in the low-temperature impurity-scattering-dominated regime for low-dopings, at which they are dominated by nodal quasiparticles. We show that the longitudinal conductivity in this limit in the DDW state is not Drude-like. However, the thermal conductivty is Drude-like; this is a reflection of the discrepancy between electrical and thermal transport at finite frequency in the DDW state. An extreme example of this occurs in the $μ=0$, $τ\to\infty$ limit, where there is a strong violation of the Wiedemann-Franz law: ${κ_{xx}}/{σ_{xx}} \propto {T^2}$ at $ω=0$ and ${κ_{xx}}/{σ_{xx}}=0$ at finite frequency. The DDW electrical and thermal Hall conductivities are linear in the magnetic field, $B$, for weak fields. The formation of Landau levels at the nodes leads to the quantization of these Hall conductivities at high fields. In all of these ways, the quasiparticles of the DDW state differ from those of the $d_{{x^2}-{y^2}}$ superconducting (DSC) state.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.12965:author:2:xiao-yang

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.01507:author:4:xiao-yang