Researcher profile

Xudong Wang

Xudong Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
18topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning

Sequential-Horizon Vision-and-Language Navigation (SH-VLN) presents a challenging scenario where agents should sequentially execute multi-task navigation guided by complex, long-horizon language instructions. Current vision-and-language navigation models exhibit significant performance degradation with such multi-task instructions, as information overload impairs the agent's ability to attend to observationally relevant details. To address this problem, we propose SeqWalker, a navigation model built on a hierarchical planning framework. Our SeqWalker features: i) A High-Level Planner that dynamically selects global instructions into contextually relevant sub-instructions based on the agent's current visual observations, thus reducing cognitive load; ii) A Low-Level Planner incorporating an Exploration-Verification strategy that leverages the inherent logical structure of instructions for trajectory error correction. To evaluate SH-VLN performance, we also extend the IVLN dataset and establish a new benchmark. Extensive experiments are performed to demonstrate the superiority of the proposed SeqWalker.

preprint2026arXiv

Visually Prompted Benchmarks Are Surprisingly Fragile

A key challenge in evaluating VLMs is testing models' ability to analyze visual content independently from their textual priors. Recent benchmarks such as BLINK probe visual perception through visual prompting, where questions about visual content are paired with coordinates to which the question refers, with the coordinates explicitly marked in the image itself. While these benchmarks are an important part of VLM evaluation, we find that existing models are surprisingly fragile to seemingly irrelevant details of visual prompting: simply changing a visual marker from red to blue can completely change rankings among models on a leaderboard. By evaluating nine commonly-used open- and closed-source VLMs on two visually prompted tasks, we demonstrate how details in benchmark setup, including visual marker design and dataset size, have a significant influence on model performance and leaderboard rankings. These effects can even be exploited to lift weaker models above stronger ones; for instance, slightly increasing the size of the visual marker results in open-source InternVL3-8B ranking alongside or better than much larger proprietary models like Gemini 2.5 Pro. We further show that low-level inference choices that are often ignored in benchmarking, such as JPEG compression levels in API calls, can also cause model lineup changes. These details have substantially larger impacts on visually prompted benchmarks than on conventional semantic VLM evaluations. To mitigate this instability, we curate existing datasets to create VPBench, a larger visually prompted benchmark with 16 visual marker variants. We open-source VPBench and our analysis framework at: https://lisadunlap.github.io/vpbench/.

preprint2022arXiv

Bayesian calibration of traffic flow fundamental diagrams using Gaussian processes

Modeling the relationship between vehicle speed and density on the road is a fundamental problem in traffic flow theory. Recent research found that using the least-squares (LS) method to calibrate single-regime speed-density models is biased because of the uneven distribution of samples. This paper explains the issue of the LS method from a statistical perspective: the biased calibration is caused by the correlations/dependencies in regression residuals. Based on this explanation, we propose a new calibration method for single-regime speed-density models by modeling the covariance of residuals via a zero-mean Gaussian Process (GP). Our approach can be viewed as a generalized least-squares (GLS) method with a specific covariance structure (i.e., kernel function) and is a generalization of the existing LS and the weighted least-squares (WLS) methods. Next, we use a sparse approximation to address the scalability issue of GPs and apply a Markov chain Monte Carlo (MCMC) sampling scheme to obtain the posterior distributions of the parameters for speed-density models and the hyperparameters (i.e., length scale and variance) of the GP kernel. Finally, we calibrate six well-known single-regime speed-density models with the proposed method. Results show that the proposed GP-based methods (1) significantly reduce the biases in the LS calibration, (2) achieve a similar effect as the WLS method, (3) can be used as a non-parametric speed-density model, and (4) provide a Bayesian solution to estimate posterior distributions of parameters and speed-density functions.

preprint2022arXiv

Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers

Hyperbolic space can naturally embed hierarchies, unlike Euclidean space. Hyperbolic Neural Networks (HNNs) exploit such representational power by lifting Euclidean features into hyperbolic space for classification, outperforming Euclidean neural networks (ENNs) on datasets with known semantic hierarchies. However, HNNs underperform ENNs on standard benchmarks without clear hierarchies, greatly restricting HNNs' applicability in practice. Our key insight is that HNNs' poorer general classification performance results from vanishing gradients during backpropagation, caused by their hybrid architecture connecting Euclidean features to a hyperbolic classifier. We propose an effective solution by simply clipping the Euclidean feature magnitude while training HNNs. Our experiments demonstrate that clipped HNNs become super-hyperbolic classifiers: They are not only consistently better than HNNs which already outperform ENNs on hierarchical data, but also on-par with ENNs on MNIST, CIFAR10, CIFAR100 and ImageNet benchmarks, with better adversarial robustness and out-of-distribution detection.

preprint2022arXiv

Debiased Learning from Naturally Imbalanced Pseudo-Labels

Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning. Our code is available at: https://github.com/frank-xwang/debiased-pseudo-labeling.

preprint2022arXiv

Hybrid integration of deterministic quantum dots-based single-photon sources with CMOS-compatible silicon carbide photonics

Thin film 4H-silicon carbide (4H-SiC) is emerging as a contender for realizing large-scale optical quantum circuits due to its high CMOS technology compatibility and large optical nonlinearities. Though, challenges remain in producing wafer-scale 4H-SiC thin film on insulator (4H-SiCOI) for dense integration of photonic circuits, and in efficient coupling of deterministic quantum emitters that are essential for scalable quantum photonics. Here we demonstrate hybrid integration of self-assembled InGaAs quantum dots (QDs) based single-photon sources (SPSs) with wafer-scale 4H-SiC photonic chips prepared by ion slicing technique. By designing a bilayer vertical coupler, we realize generation and highly efficient routing of single-photon emission in the hybrid quantum photonic chip. Furthermore, we realize a chip-integrated beamsplitter operation for triggered single photons through fabricating a 1x2 multi-mode interferometer (MMI) with a symmetric power splitting ratio of 50:50. The successful demonstration of heterogeneously integrating QDs-based SPSs on 4H-SiC photonic chip prepared by ion slicing technique constitutes an important step toward CMOS-compatible, fast reconfigurable quantum photonic circuits with deterministic SPSs.

preprint2022arXiv

Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech

Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determine the subsequent surgery and additional speech therapy. Therefore, designing an automatic hypernasality assessment method will facilitate speech-language pathologists to make precise diagnoses. Existing methods for hypernasality estimation only conduct acoustic analysis based on low-resource cleft palate dataset, by using statistical or neural network-based features. In this paper, we propose a novel approach that uses automatic speech recognition model to improve hypernasality estimation. Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation. Benefiting from such design, our model for hypernasality estimation can enjoy the advantages of ASR model: 1) compared with low-resource cleft palate dataset, the ASR task usually includes large-scale speech data in the general domain, which enables better model generalization; 2) the text annotations in ASR dataset guide model to extract better acoustic features. Experimental results on two cleft palate datasets demonstrate that our method achieves superior performance compared with previous approaches.

preprint2022arXiv

Long-tailed Recognition by Routing Diverse Distribution-Aware Experts

Natural data are often long-tail distributed over semantic classes. Existing recognition methods tackle this imbalanced classification by placing more emphasis on the tail data, through class re-balancing/re-weighting or ensembling over different data groups, resulting in increased tail accuracies but reduced head accuracies. We take a dynamic view of the training data and provide a principled model bias and variance analysis as the training data fluctuates: Existing long-tail classifiers invariably increase the model variance and the head-tail model bias gap remains large, due to more and larger confusion with hard negatives for the tail. We propose a new long-tailed classifier called RoutIng Diverse Experts (RIDE). It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module. RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, ImageNet-LT and iNaturalist 2018 benchmarks. It is also a universal framework that is applicable to various backbone networks, long-tailed algorithms, and training mechanisms for consistent performance gains. Our code is available at: https://github.com/frank-xwang/RIDE-LongTailRecognition.

preprint2022arXiv

Low-Rank Hankel Tensor Completion for Traffic Speed Estimation

This paper studies the traffic state estimation (TSE) problem using sparse observations from mobile sensors. Most existing TSE methods either rely on well-defined physical traffic flow models or require large amounts of simulation data as input to train machine learning models. Different from previous studies, we propose a purely data-driven and model-free solution in this paper. We consider the TSE as a spatiotemporal matrix completion/interpolation problem, and apply spatiotemporal delay embedding to transform the original incomplete matrix into a fourth-order Hankel structured tensor. By imposing a low-rank assumption on this tensor structure, we can approximate and characterize both global and local spatiotemporal patterns in a data-driven manner. We use the truncated nuclear norm of a balanced spatiotemporal unfolding -- in which each column represents the vectorization of a small patch in the original matrix -- to approximate the tensor rank. An efficient solution algorithm based on the Alternating Direction Method of Multipliers (ADMM) is developed for model learning. The proposed framework only involves two hyperparameters, spatial and temporal window lengths, which are easy to set given the degree of data sparsity. We conduct numerical experiments on real-world high-resolution trajectory data, and our results demonstrate the effectiveness and superiority of the proposed model in some challenging scenarios.

preprint2022arXiv

Random diffusivity processes in an external force field

Brownian yet non-Gaussian processes have recently been observed in numerous biological systems and the corresponding theories have been built based on random diffusivity models. Considering the particularity of random diffusivity, this paper studies the effect of an external force acting on two kinds of random diffusivity models whose difference is embodied in whether the fluctuation-dissipation theorem is valid. Based on the two random diffusivity models, we derive the Fokker-Planck equations with an arbitrary external force, and analyse various observables in the case with a constant force, including the Einstein relation, the moments, the kurtosis, and the asymptotic behaviors of the probability density function of particle's displacement at different time scales. Both the theoretical results and numerical simulations of these observables show significant difference between the two kinds of random diffusivity models, which implies the important role of the fluctuation-dissipation theorem in random diffusivity systems.

preprint2022arXiv

Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers

Unsupervised semantic segmentation aims to discover groupings within and across images that capture object and view-invariance of a category without external supervision. Grouping naturally has levels of granularity, creating ambiguity in unsupervised segmentation. Existing methods avoid this ambiguity and treat it as a factor outside modeling, whereas we embrace it and desire hierarchical grouping consistency for unsupervised segmentation. We approach unsupervised segmentation as a pixel-wise feature learning problem. Our idea is that a good representation shall reveal not just a particular level of grouping, but any level of grouping in a consistent and predictable manner. We enforce spatial consistency of grouping and bootstrap feature learning with co-segmentation among multiple views of the same image, and enforce semantic consistency across the grouping hierarchy with clustering transformers between coarse- and fine-grained features. We deliver the first data-driven unsupervised hierarchical semantic segmentation method called Hierarchical Segment Grouping (HSG). Capturing visual similarity and statistical co-occurrences, HSG also outperforms existing unsupervised segmentation methods by a large margin on five major object- and scene-centric benchmarks. Our code is publicly available at https://github.com/twke18/HSG .

preprint2021arXiv

Ergodic property of random diffusivity system with trapping events

Brownian yet non-Gaussian phenomenon has recently been observed in many biological and active matter systems. The main idea of explaining this phenomenon is to introduce a random diffusivity for particles moving in inhomogeneous environment. This paper considers a Langevin system containing a random diffusivity and an $α$-stable subordinator with $α<1$. This model describes the particle&#39;s motion in complex media where both the long trapping events and random diffusivity exist. We derive the general expressions of ensemble- and time-averaged mean-squared displacements which only contain the values of the inverse subordinator and diffusivity. Further taking specific time-dependent diffusivity, we obtain the analytic expressions of ergodicity breaking parameter and probability density function of the time-averaged mean-squared displacement. The results imply the nonergodicity of the random diffusivity model for any kind of diffusivity, including the critical case where the model presenting normal diffusion.

preprint2021arXiv

Novel anomalous diffusion phenomena of underdamped Langevin equation with random parameters

The diffusion behavior of particles moving in complex heterogeneous environment is a very topical issue. We characterize particle&#39;s trajectory via an underdamped Langevin system driven by a Gaussian white noise with a time dependent diffusivity of velocity, together with a random relaxation timescale $τ$ to parameterize the effect of complex medium. We mainly concern how the random parameter $τ$ influences the diffusion behavior and ergodic property of this Langevin system. Besides, the comparison between the fixed and random initial velocity $v_0$ is conducted to show the effect of different initial ensembles. The heavy-tailed distribution of $τ$ with finite mean is found to suppress the decay rate of the velocity correlation function and promote the diffusion behavior, playing a competition role to the time dependent diffusivity. More interestingly, a random $v_0$ with a specific distribution depending on random $τ$ also enhances the diffusion. Both the random parameters $τ$ and $v_0$ influence the dynamics of the Langevin system in an non-obvious way, which cannot be ignored even they has finite moments.

preprint2020arXiv

Structure-Feature based Graph Self-adaptive Pooling

Various methods to deal with graph data have been proposed in recent years. However, most of these methods focus on graph feature aggregation rather than graph pooling. Besides, the existing top-k selection graph pooling methods have a few problems. First, to construct the pooled graph topology, current top-k selection methods evaluate the importance of the node from a single perspective only, which is simplistic and unobjective. Second, the feature information of unselected nodes is directly lost during the pooling process, which inevitably leads to a massive loss of graph feature information. To solve these problems mentioned above, we propose a novel graph self-adaptive pooling method with the following objectives: (1) to construct a reasonable pooled graph topology, structure and feature information of the graph are considered simultaneously, which provide additional veracity and objectivity in node selection; and (2) to make the pooled nodes contain sufficiently effective graph information, node feature information is aggregated before discarding the unimportant nodes; thus, the selected nodes contain information from neighbor nodes, which can enhance the use of features of the unselected nodes. Experimental results on four different datasets demonstrate that our method is effective in graph classification and outperforms state-of-the-art graph pooling methods.

preprint2020arXiv

The First Round Result from the TianQin-1 Satellite

The TianQin-1 satellite (TQ-1), which is the first technology demonstration satellite for the TianQin project, was launched on 20 December 2019. The first round of experiment had been carried out from 21 December 2019 until 1 April 2020. The residual acceleration of the satellite is found to be about $1\times10^{-10}~{\rm m}/{\rm s}^{2}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}\,$ and about $5\times10^{-11}~{\rm m}/{\rm s}^{2}/{\rm Hz}^{1/2}$ at $0.05~{\rm Hz}\,$, measured by an inertial sensor with a sensitivity of $5\times10^{-12}~{\rm m}/{\rm s}^{2}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}\,$. The micro-Newton thrusters has demonstrated a thrust resolution of $0.1~μ{\rm N}$ and a thrust noise of $0.3~μ{\rm N}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}$. The residual noise of the satellite with drag-free control is $3\times10^{-9}~{\rm m}/{\rm s}^{2}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}\,$. The noise level of the optical readout system is about $30~{\rm pm}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}\,$. The temperature stability at temperature monitoring position is controlled to be about $\pm3~{\rm mK}$ per orbit, and the mismatch between the center-of-mass of the satellite and that of the test mass is measured with a precision of better than $0.1~{\rm mm}$.

preprint2020arXiv

The TianQin project: current progress on science and technology

TianQin is a planned space-based gravitational wave (GW) observatory consisting of three earth orbiting satellites with an orbital radius of about $10^5~{\rm km}$. The satellites will form a equilateral triangle constellation the plane of which is nearly perpendicular to the ecliptic plane. TianQin aims to detect GWs between $10^{-4}~{\rm Hz}$ and $1~{\rm Hz}$ that can be generated by a wide variety of important astrophysical and cosmological sources, including the inspiral of Galactic ultra-compact binaries, the inspiral of stellar-mass black hole binaries, extreme mass ratio inspirals, the merger of massive black hole binaries, and possibly the energetic processes in the very early universe or exotic sources such as cosmic strings. In order to start science operations around 2035, a roadmap called the 0123 plan is being used to bring the key technologies of TianQin to maturity, supported by the construction of a series of research facilities on the ground. Two major projects of the 0123 plan are being carried out. In this process, the team has created a new generation $17~{\rm cm}$ single-body hollow corner-cube retro-reflector which has been launched with the QueQiao satellite on 21 May 2018; a new laser ranging station equipped with a $1.2~{\rm m}$ telescope has been constructed and the station has successfully ranged to all the five retro-reflectors on the Moon; and the TianQin-1 experimental satellite has been launched on 20 December 2019 and the first round result shows that the satellite has exceeded all of its mission requirements.

preprint2020arXiv

Volumetric Attention for 3D Medical Image Segmentation and Detection

A volumetric attention(VA) module for 3D medical image segmentation and detection is proposed. VA attention is inspired by recent advances in video processing, enables 2.5D networks to leverage context information along the z direction, and allows the use of pretrained 2D detection models when training data is limited, as is often the case for medical applications. Its integration in the Mask R-CNN is shown to enable state-of-the-art performance on the Liver Tumor Segmentation (LiTS) Challenge, outperforming the previous challenge winner by 3.9 points and achieving top performance on the LiTS leader board at the time of paper submission. Detection experiments on the DeepLesion dataset also show that the addition of VA to existing object detectors enables a 69.1 sensitivity at 0.5 false positive per image, outperforming the best published results by 6.6 points.

preprint2019arXiv

Langevin picture of Lévy walk in a constant force field

Lévy walk is a practical model and has wide applications in various fields. Here we focus on the effect of an external constant force on the Lévy walk with the exponent of the power-law distributed flight time $α\in(0,2)$. We add the term $Fη(s)$ ($η(s)$ is the Lévy noise) on a subordinated Langevin system to characterize such a constant force, being effective on the velocity process for all physical time after the subordination. We clearly show the effect of the constant force $F$ on this Langevin system and find this system is like the continuous limit of the collision model. The first moments of velocity processes for these two models are consistent. In particular, based on the velocity correlation function derived from our subordinated Langevin equation, we investigate more interesting statistical quantities, such as the ensemble- and time-averaged mean squared displacements. Under the influence of constant force, the diffusion of particles becomes faster. Finally, the super-ballistic diffusion and the non-ergodic behavior are verified by the simulations with different $α$.

preprint2019arXiv

Strong anomalous diffusion in two-state process with Lévy walk and Brownian motion

Strong anomalous diffusion phenomena are often observed in complex physical and biological systems, which are characterized by the nonlinear spectrum of exponents $qν(q)$ by measuring the absolute $q$-th moment $\langle |x|^q\rangle$. This paper investigates the strong anomalous diffusion behavior of a two-state process with Lévy walk and Brownian motion, which usually serves as an intermittent search process. The sojourn times in Lévy walk and Brownian phases are taken as power law distributions with exponents $α_+$ and $α_-$, respectively. Detailed scaling analyses are performed for the coexistence of three kinds of scalings in this system. Different from the pure Lévy walk, the phenomenon of strong anomalous diffusion can be observed for this two-state process even when the distribution exponent of Lévy walk phase satisfies $α_+<1$, provided that $α_-<α_+$. When $α_+<2$, the probability density function (PDF) in the central part becomes a combination of stretched Lévy distribution and Gaussian distribution due to the long sojourn time in Brownian phase, while the PDF in the tail part (in the ballistic scaling) is still dominated by the infinite density of Lévy walk.

preprint2019arXiv

Theory of relaxation dynamics for anomalous diffusion processes in harmonic potential

Optical tweezers setup is often used to probe the motion of individual tracer particle, which promotes the study of relaxation dynamics of a generic process confined in a harmonic potential. We uncover the dependence of ensemble- and time-averaged mean square displacements of confined processes on the velocity correlation function $C(t,t+τ)$ of the original process. With two different scaling forms of $C(t,t+τ)$ for small $τ$ and large $τ$, the stationary value and the relaxation behaviors can be obtained immediately. The gotten results are valid for a large amount of anomalous diffusion processes, including fractional Brownian motion, scaled Brownian motion, and the multi-scale Lévy walk with different exponents of running time distribution.

preprint2018arXiv

Nonlocal Diffusion Operators for Normal and Anomalous Dynamics

The Laplacian $Δ$ is the infinitesimal generator of isotropic Brownian motion, being the limit process of normal diffusion, while the fractional Laplacian $Δ^{β/2}$ serves as the infinitesimal generator of the limit process of isotropic Lévy process. Taking limit, in some sense, means that the operators can approximate the physical process well after sufficient long time. We introduce the nonlocal operators (being effective from the starting time), which describe the general processes undergoing normal diffusion. For anomalous diffusion, we extend to the anisotropic fractional Laplacian $Δ_m^{β/2}$ and the tempered one $Δ_m^{β/2,λ}$ in $\mathbb{R}^n$. Their definitions are proved to be equivalent to an alternative one in Fourier space. Based on these new nonlocal diffusion operators, we further derive the deterministic governing equations of some interesting statistical observables of the very general jump processes with multiple internal states. Finally, we consider the associated initial and boundary value problems and prove their well-posedness of the Galerkin weak formulation in $\mathbb{R}^n$. To obtain the coercivity, we claim that the probability density function $m(Y)$ should be nondegenerate.

preprint2017arXiv

Discontinuous Galerkin methods and their adaptivity for the tempered fractional (convection) diffusion equations

This paper focuses on the adaptive discontinuous Galerkin (DG) methods for the tempered fractional (convection) diffusion equations. The DG schemes with interior penalty for the diffusion term and numerical flux for the convection term are used to solve the equations, and the detailed stability and convergence analyses are provided. Based on the derived posteriori error estimates, the local error indicator is designed. The theoretical results and the effectiveness of the adaptive DG methods are respectively verified and displayed by the extensive numerical experiments. The strategy of designing adaptive schemes presented in this paper works for the general PDEs with fractional operators.