Researcher profile

Yanan Wang

Yanan Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

DAPE: Dynamic Non-uniform Alignment and Progressive Detail Enhancement Techniques for Improving the Performance of Efficient Visual Language Models

In recent years, pre-trained visual-linguistic models have demonstrated tremendous potential, becoming a crucial foundational framework for numerous downstream tasks. However, the information density between text and images is not uniformly distributed. Existing methods often overlook the inherent and dynamic differences in information density and semantic scope between text tags and image blocks. These common uniform alignment strategies result in coarse-grained cross-modal interactions and loss of fine semantic details. Moreover, pursuing finer alignment typically requires substantial computational overhead, limiting practical model deployment. To address this challenge, this paper proposes a novel framework for dynamic cross-modal alignment with continuous detail introduction. First, we design a dynamically adaptive cross-modal matching mechanism that uses a learnable matching function to dynamically assign varying numbers and sizes of image tags to text tags of the same size but different information density, enabling more precise attention interaction. Second, we develop a continuous detail introduction module to progressively incorporate high-resolution visual feature enhancement into the alignment process. Extensive experiments across multiple benchmarks demonstrate significant improvements in the accuracy of various downstream tasks while reducing computational overhead.

preprint2026arXiv

Learning Audio-Visual Embeddings with Inferred Latent Interaction Graphs

Learning robust audio-visual embeddings requires bringing genuinely related audio and visual signals together while filtering out incidental co-occurrences - background noise, unrelated elements, or unannotated events. Most contrastive and triplet-loss methods use sparse annotated labels per clip and treat any co-occurrence as semantic similarity. For example, a video labeled "train" might also contain motorcycle audio and visual, because "motorcycle" is not the chosen annotation; standard methods treat these co-occurrences as negatives to true motorcycle anchors elsewhere, creating false negatives and missing true cross-modal dependencies. We propose a framework that leverages soft-label predictions and inferred latent interactions to address these issues: (1) Audio-Visual Semantic Alignment Loss (AV-SAL) trains a teacher network to produce aligned soft-label distributions across modalities, assigning nonzero probability to co-occurring but unannotated events and enriching the supervision signal. (2) Inferred Latent Interaction Graph (ILI) applies the GRaSP algorithm to teacher soft labels to infer a sparse, directed dependency graph among classes. This graph highlights directional dependencies (e.g., "Train (visual)" -> "Motorcycle (audio)") that expose likely semantic or conditional relationships between classes; these are interpreted as estimated dependency patterns. (3) Latent Interaction Regularizer (LIR): A student network is trained with both metric loss and a regularizer guided by the ILI graph, pulling together embeddings of dependency-linked but unlabeled pairs in proportion to their soft-label probabilities. Experiments on AVE and VEGAS benchmarks show consistent improvements in mean average precision (mAP), demonstrating that integrating inferred latent interactions into embedding learning enhances robustness and semantic coherence.

preprint2026arXiv

Qwen-Image-2.0 Technical Report

We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment, especially in text-rich and compositionally complex scenarios. Qwen-Image-2.0 addresses these challenges by coupling Qwen3-VL as the condition encoder with a Multimodal Diffusion Transformer for joint condition-target modeling, supported by large-scale data curation and a customized multi-stage training pipeline. This enables strong multimodal understanding while preserving flexible generation and editing capabilities. The model supports instructions of up to 1K tokens for generating text-rich content such as slides, posters, infographics, and comics, while significantly improving multilingual text fidelity and typography. It also enhances photorealistic generation with richer details, more realistic textures, and coherent lighting, and follows complex prompts more reliably across diverse styles. Extensive human evaluations show that Qwen-Image-2.0 substantially outperforms previous Qwen-Image models in both generation and editing, marking a step toward more general, reliable, and practical image generation foundation models.

preprint2025arXiv

Detection of disk-jet co-precession in a tidal disruption event

Theories and simulations predict that intense spacetime curvature near black holes bends the trajectories of light and matter, driving disk and jet precession under relativistic torques. However, direct observational evidence of disk-jet co-precession remains elusive. Here, we report the most compelling case to date: a tidal disruption event (TDE) exhibiting unprecedented 19.6-day quasi-periodic variations in both X-rays and radio, with X-ray amplitudes exceeding an order of magnitude. The nearly synchronized X-ray and radio variations suggest a shared mechanism regulating the emission regions. We demonstrate that a disk-jet Lense-Thirring precession model successfully reproduces these variations while requiring a low-spin black hole. This study uncovers previously uncharted short-term radio variability in TDEs, highlights the transformative potential of high-cadence radio monitoring, and offers profound insights into disk-jet physics.

preprint2024arXiv

RMS-flux slope in MAXI J1820+070: a measure of the disk-corona coupling

Linear RMS-flux relation has been well established in different spectral states of all accreting systems. In this work, we study the evolution of the frequency-dependent RMS-flux relation of MAXI J1820+070 during the initial decaying phase of the 2018 outburst with Insight-HXMT over a broad energy range 1-150 keV. As the flux decreases, we first observe a linear RMS-flux relation at frequencies from 2 mHz to 10 Hz, while such a relation breaks at varying times for different energies, leading to a substantial reduction in the slope. Moreover, we find that the low-frequency variability exhibits the highest sensitivity to the break, which occurs prior to the hard-to-hard state transition time determined through time-averaged spectroscopy, and the time deviation increases with energy. The overall evolution of the RMS-flux slope and intercept suggests the presence of a two-component Comptonization system. One component is radially extended, explaining the strong disk-corona coupling before the break, while the other component extends vertically, contributing to the reduction of the disk-corona coupling after the break. A further vertical expansion of the latter component is required to accommodate the dynamic evolution observed in the RMS-flux slope. In conclusion, we suggest that the RMS-flux slope in 1-150 keV band can be employed as an indicator of the disk-corona coupling and the hard-to-hard state transition in MAXI J1820+070 could be partially driven by the changes in the corona geometry.

preprint2024arXiv

TimeGraphs: Graph-based Temporal Reasoning

Many real-world systems exhibit temporal, dynamic behaviors, which are captured as time series of complex agent interactions. To perform temporal reasoning, current methods primarily encode temporal dynamics through simple sequence-based models. However, in general these models fail to efficiently capture the full spectrum of rich dynamics in the input, since the dynamics is not uniformly distributed. In particular, relevant information might be harder to extract and computing power is wasted for processing all individual timesteps, even if they contain no significant changes or no new information. Here we propose TimeGraphs, a novel approach that characterizes dynamic interactions as a hierarchical temporal graph, diverging from traditional sequential representations. Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales. Adopting a self-supervised method, TimeGraphs constructs a multi-level event hierarchy from a temporal input, which is then used to efficiently reason about the unevenly distributed dynamics. This construction process is scalable and incremental to accommodate streaming data. We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset. The results demonstrate both robustness and efficiency of TimeGraphs on a range of temporal reasoning tasks. Our approach obtains state-of-the-art performance and leads to a performance increase of up to 12.2% on event prediction and recognition tasks over current approaches. Our experiments further demonstrate a wide array of capabilities including zero-shot generalization, robustness in case of data sparsity, and adaptability to streaming data flow.

preprint2023arXiv

The radio detection and accretion properties of the peculiar nuclear transient AT 2019avd

AT 2019avd is a nuclear transient detected from infrared to soft X-rays, though its nature is yet unclear. The source has shown two consecutive flaring episodes in the optical and the infrared bands and its second flare was covered by X-ray monitoring programs. During this flare, the UVOT/Swift photometries revealed two plateaus: one observed after the peak and the other one appeared ~240 days later. Meanwhile, our NICER and XRT/Swift campaigns show two declines in the X-ray emission, one during the first optical plateau and one 70-90 days after the optical/UV decline. The evidence suggests that the optical/UV could not have been primarily originated from X-ray reprocessing. Furthermore, we detected a timelag of ~16-34 days between the optical and UV emission, which indicates the optical likely comes from UV reprocessing by a gas at a distance of 0.01-0.03 pc. We also report the first VLA and VLBA detection of this source at different frequencies and different stages of the second flare. The information obtained in the radio band - namely a steep and a late-time inverted radio spectrum, a high brightness temperature and a radio-loud state at late times - together with the multiwavelength properties of AT 2019avd suggests the launching and evolution of outflows such as disc winds or jets. In conclusion, we propose that after the ignition of black hole activity in the first flare, a super-Eddington flaring accretion disc formed and settled to a sub-Eddington state by the end of the second flare, associated with a compact radio outflow.

preprint2022arXiv

Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification

Recently, large-scale synthetic datasets are shown to be very useful for generalizable person re-identification. However, synthesized persons in existing datasets are mostly cartoon-like and in random dress collocation, which limits their performance. To address this, in this work, an automatic approach is proposed to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart. Specifically, based on UV texture mapping, two cloning methods are designed, namely registered clothes mapping and homogeneous cloth expansion. Given clothes keypoints detected on person images and labeled on regular UV maps with clear clothes structures, registered mapping applies perspective homography to warp real-world clothes to the counterparts on the UV map. As for invisible clothes parts and irregular UV maps, homogeneous expansion segments a homogeneous area on clothes as a realistic cloth pattern or cell, and expand the cell to fill the UV map. Furthermore, a similarity-diversity expansion strategy is proposed, by clustering person images, sampling images per cluster, and cloning outfits for 3D character generation. This way, virtual persons can be scaled up densely in visual similarity to challenge model learning, and diversely in population to enrich sample distribution. Finally, by rendering the cloned characters in Unity3D scenes, a more realistic virtual dataset called ClonedPerson is created, with 5,621 identities and 887,766 images. Experimental results show that the model trained on ClonedPerson has a better generalization performance, superior to that trained on other popular real-world and synthetic person re-identification datasets. The ClonedPerson project is available at https://github.com/Yanan-Wang-cs/ClonedPerson.

preprint2022arXiv

Determination of QPO properties in the presence of strong broad-band noise: a case study on the data of MAXI J1820+070

Accurate calculation of the phase lags of quasi-periodic oscillations (QPOs) will provide insight into their origin. In this paper we investigate the phase lag correction method which has been applied to calculate the intrinsic phase lags of the QPOs in MAXI J1820+070. We find that the traditional additive model between BBN and QPOs in the time domain is rejected, but the convolution model is accepted. By introducing a convolution mechanism in the time domain, the Fourier cross-spectrum analysis shows that the phase lags between QPOs components in different energy bands will have a simple linear relationship with the phase lags between the total signals, so that the intrinsic phase lags of the QPOs can be obtained by linear correction. The power density spectrum (PDS) thus requires a multiplicative model to interpret the data. We briefly discuss a physical scenario for interpreting the convolution. In this scenario, the corona acts as a low-pass filter, the Green's function containing the noise is convolved with the QPOs to form the low-frequency part of the PDS, while the high-frequency part requires an additive component. We use a multiplicative PDS model to fit the data observed by Insight-HXMT. The overall fitting results are similar compared to the traditional additive PDS model. Neither the width nor the centroid frequency of the QPOs obtained from each of the two PDS models were significantly different, except for the r.m.s. of the QPOs. Our work thus provides a new perspective on the coupling of noise and QPOs.

preprint2022arXiv

Insight-HXMT Study of the Inner Accretion Disk in the Black Hole Candidate EXO 1846--031

We study the spectral evolution of the black hole candidate EXO 1846$-$031 during its 2019 outburst, in the 1--150 keV band,with the {\it {Hard X-ray Modulation Telescope}}. The continuum spectrum is well modelled with an absorbed disk-blackbody plus cutoff power-law, in the hard, intermediate and soft states. In addition, we detect an $\approx$6.6 keV Fe emission line in the hard intermediate state. Throughout the soft intermediate and soft states, the fitted inner disk radius remains almost constant; we suggest that it has settled at the innermost stable circular orbit (ISCO). However, in the hard and hard intermediate states, the apparent inner radius was unphysically small (smaller than ISCO), even after accounting for the Compton scattering of some of the disk photons by the corona in the fit. We argue that this is the result of a high hardening factor, $f_{\rm col}\approx2.0-2.7$, in the early phases of outburst evolution, well above the canonical value of 1.7 suitable to a steady disk. We suggest that the inner disk radius was close to ISCO already in the low/hard state. Furthermore, we propose that this high value of hardening factor in the relatively hard state is probably caused by the additional illuminating of the coronal irradiation onto the disk. Additionally, we estimate the spin parameter with the continuum-fitting method, over a range of plausible black hole masses and distances. We compare our results with the spin measured with the reflection-fitting method and find that the inconsistency of the two results is partly caused by the different choices of $f_{\rm col}$.

preprint2022arXiv

Sequential Search with Off-Policy Reinforcement Learning

Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume. To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from long-term interactions. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. Moreover, we explore the use of off-policy reinforcement learning in multi-session personalized search ranking. Specifically, we design a pairwise Deep Deterministic Policy Gradient model that efficiently captures users' long term reward in terms of pairwise classification error. Extensive ablation experiments demonstrate significant improvement each component brings to its state-of-the-art baseline, on a variety of offline and online metrics.

preprint2020arXiv

A Systematic Analysis of the Phase Lags Associated with the Type-C Quasi-periodic Oscillation in GRS 1915+105

We present a systematic analysis of the phase lags associated with the type-C QPOs in GRS 1915+105 using RXTE data. Our sample comprises of 620 RXTE observations with type-C QPOs ranging from ~0.4 Hz to ~6.3 Hz. Based on our analysis, we confirm that the QPO phase lags decrease with QPO frequency, and change sign from positive to negative at a QPO frequency of ~2 Hz. In addition, we find that the slope of this relation is significantly different between QPOs below and above 2 Hz. The relation between the QPO lags and QPO rms can be well fitted with a broken line: as the QPO lags go from negative to positive, the QPO rms first increases, reaching its maximum at around zero lag, and then decreases. The phase-lag behaviour of the subharmonic of the QPO is similar to that of the QPO fundamental, where the subharmonic lags decrease with subharmonic frequency and change sign from positive to negative at a subharmonic frequency of ~1 Hz; on the contrary, the second harmonic of the QPO shows a quite different phase-lag behaviour, where all the second harmonics show hard lags that remain more or less constant. For both the QPO and its (sub)harmonics, the slope of the lag-energy spectra shows a similar evolution with frequency as the average phase lags. This suggests that the lag-energy spectra drives the average phase lags. We discuss the possibility for the change in lag sign, and the physical origin of the QPO lags.

preprint2020arXiv

A variable ionized disk wind in the black-hole candidate EXO 1846-031

After 34 years, the black-hole candidate EXO 1846-031 went into outburst again in 2019. We investigate its spectral properties in the hard intermediate and the soft states with NuSTAR and Insight-HXMT. A reflection component has been detected in the two spectral states but possibly originating from different illumination spectra: in the intermediate state, the illuminating source is attributed to a hard coronal component, which has been commonly observed in other X-ray binaries, whereas in the soft state the reflection is probably produced by the disk self-irradiation. Both cases support EXO 1846-031 as a low inclination system of ~40 degrees. An absorption line is clearly detected at ~7.2 keV in the hard intermediate state, corresponding to a highly ionized disk wind (log ξ > 6.1) with a velocity up to 0.06c. Meanwhile, quasi-simultaneous radio emissions have been detected before and after the X-rays, implying the co-existence of disk winds and jets in this system. Additionally, the observed wind in this source is potentially driven by magnetic forces. The absorption line disappeared in the soft state and a narrow emission line appeared at ~6.7 keV on top of the reflection component, which may be evidence for disk winds, but data with the higher spectral resolution are required to examine this.

preprint2020arXiv

Hexagonal Boron Nitride Phononic Crystal Waveguides

Hexagonal boron nitride (h-BN), one of the hallmark van der Waals (vdW) layered crystals with an ensemble of attractive physical properties, is playing increasingly important roles in exploring two-dimensional (2D) electronics, photonics, mechanics, and emerging quantum engineering. Here, we report on the demonstration of h-BN phononic crystal waveguides with designed pass and stop bands in the radio frequency (RF) range and controllable wave propagation and transmission, by harnessing arrays of coupled h-BN nanomechanical resonators with engineerable coupling strength. Experimental measurements validate that these phononic crystal waveguides confine and support 15 to 24 megahertz (MHz) wave propagation over 1.2 millimeters. Analogous to solid-state atomic crystal lattices, phononic bandgaps and dispersive behaviors have been observed and systematically investigated in the h-BN phononic waveguides. Guiding and manipulating acoustic waves on such additively integratable h-BN platform may facilitate multiphysical coupling and information transduction, and open up new opportunities for coherent on-chip signal processing and communication via emerging h-BN photonic and phononic devices.

preprint2020arXiv

SEMDOT: Smooth-Edged Material Distribution for Optimizing Topology Algorithm

Element-based topology optimization algorithms capable of generating smooth boundaries have drawn serious attention given the significance of accurate boundary information in engineering applications. The basic framework of a new element-based continuum algorithm is proposed in this paper. This algorithm is based on a smooth-edged material distribution strategy that uses solid/void grid points assigned to each element. Named Smooth-Edged Material Distribution for Optimizing Topology (SEMDOT), the algorithm uses elemental volume fractions which depend on the densities of grid points in the Finite Element Analysis (FEA) model rather than elemental densities. Several numerical examples are studied to demonstrate the application and effectiveness of SEMDOT. In these examples, SEMDOT proved to be capable of obtaining optimized topologies with smooth and clear boundaries showing better or comparable performance compared to other topology optimization methods. Through these examples, first, the advantages of using the Heaviside smooth function are discussed in comparison to the Heaviside step function. Then, the benefits of introducing multiple filtering steps in this algorithm are shown. Finally, comparisons are conducted to exhibit the differences between SEMDOT and some well-established element-based algorithms. The validation of the sensitivity analysis method adopted in SEMDOT is conducted using a typical compliant mechanism design case. In addition, this paper provides the Matlab code of SEMDOT for educational and academic purposes.

preprint2020arXiv

Surpassing Real-World Source Training Data: Random 3D Characters for Generalizable Person Re-Identification

Person re-identification has seen significant advancement in recent years. However, the ability of learned models to generalize to unknown target domains still remains limited. One possible reason for this is the lack of large-scale and diverse source training data, since manually labeling such a dataset is very expensive and privacy sensitive. To address this, we propose to automatically synthesize a large-scale person re-identification dataset following a set-up similar to real surveillance but with virtual environments, and then use the synthesized person images to train a generalizable person re-identification model. Specifically, we design a method to generate a large number of random UV texture maps and use them to create different 3D clothing models. Then, an automatic code is developed to randomly generate various different 3D characters with diverse clothes, races and attributes. Next, we simulate a number of different virtual environments using Unity3D, with customized camera networks similar to real surveillance systems, and import multiple 3D characters at the same time, with various movements and interactions along different paths through the camera networks. As a result, we obtain a virtual dataset, called RandPerson, with 1,801,816 person images of 8,000 identities. By training person re-identification models on these synthesized person images, we demonstrate, for the first time, that models trained on virtual data can generalize well to unseen target images, surpassing the models trained on various real-world datasets, including CUHK03, Market-1501, DukeMTMC-reID, and almost MSMT17. The RandPerson dataset is available at https://github.com/VideoObjectSearch/RandPerson.

preprint2020arXiv

The evolution of the broadband temporal features observed in the black-hole transient MAXI J1820+070 with Insight-HXMT

We study the evolution of the temporal properties of MAXI 1820+070 during the 2018 outburst in its hard state from MJD 58190 to 58289 with Insight-HXMT in a broad energy band 1-150 keV. We find different behaviors of the hardness ratio, the fractional rms and time lag before and after MJD 58257, suggesting a transition occurred at around this point. The observed time lags between the soft photons in the 1-5 keV band and the hard photons in higher energy bands, up to 150 keV, are frequency-dependent: the time lags in the low-frequency range, 2-10 mHz, are both soft and hard lags with a timescale of dozens of seconds but without a clear trend along the outburst; the time lags in the high-frequency range, 1-10 Hz, are only hard lags with a timescale of tens of milliseconds; first increase until around MJD 58257 and decrease after this date. The high-frequency time lags are significantly correlated to the photon index derived from the fit to the quasi-simultaneous NICER spectrum in the 1-10 keV band. This result is qualitatively consistent with a model in which the high-frequency time lags are produced by Comptonization in a jet.

preprint2019arXiv

Graphene Induced Large Shift of Surface Plasmon Resonances of Gold Films: Effective Medium Theory for Atomically Thin Materials

Despite successful modeling of graphene as a 0.34-nm thick optical film synthesized by exfoliation or chemical vapor deposition (CVD), graphene induced shift of surface plasmon resonance (SPR) of gold films has remained controversial. Here we report the resolution of this controversy by developing a clean CVD graphene transfer method and extending Maxwell-Garnet effective medium theory (EMT) to 2D materials. A SPR shift of 0.24 is obtained and it agrees well with 2D EMT in which wrinkled graphene is treated as a 3-nm graphene/air layered composite, in agreement with the average roughness measured by atomic force microscope. Because the anisotropic built-in boundary condition of 2D EMT is compatible with graphene's optical anisotropy, graphene can be modelled as a film thicker than 0.34-nm without changing its optical property; however, its actual roughness, i.e., effective thickness will significantly alter its response to strong out-of-plane fields, leading to a larger SPR shift.