Researcher profile

Yuhao Wang

Yuhao Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

A multivariate extension of Azadkia-Chatterjee's rank coefficient

The Azadkia-Chatterjee coefficient is a rank-based measure of dependence between a random variable $Y \in \mathbb{R}$ and a random vector ${\boldsymbol Z} \in \mathbb{R}^{d_Z}$. In this paper, we propose a multivariate extension that measures the dependence between random vectors ${\boldsymbol Y} \in \mathbb{R}^{d_Y}$ and ${\boldsymbol Z} \in \mathbb{R}^{d_Z}$, based on $n$ i.i.d. samples. The proposed coefficient converges almost surely to a limit with the following properties: i) it lies in $[0, 1]$; ii) it is equal to zero if and only if ${\boldsymbol Y}$ and ${\boldsymbol Z}$ are independent; and iii) it is equal to one if and only if ${\boldsymbol Y}$ is almost surely a function of ${\boldsymbol Z}$. Remarkably, the only assumption required by this convergence is that ${\boldsymbol Y}$ is not almost surely a constant vector. We further prove that under the same mild condition and after a proper scaling, this coefficient converges in distribution to a standard normal random variable when ${\boldsymbol Y}$ and ${\boldsymbol Z}$ are independent. This asymptotic normality result allows us to construct a Wald-type hypothesis test of independence based on this coefficient. To compute this coefficient, we propose a merge sort based algorithm that runs in $O(n (\log n)^{d_Y})$. Finally, we show that it can be used to measure the conditional dependence between ${\boldsymbol Y}$ and ${\boldsymbol Z}$ conditional on a third random vector ${\boldsymbol X}$, and prove that the measure is monotonic with respect to the deviation from an independence distribution under certain model restrictions.

preprint2026arXiv

Mean Testing under Truncation beyond Gaussian

We characterize the fundamental limits of high-dimensional mean testing under arbitrary truncation, where samples are drawn from the conditional distribution $P(\cdot \mid S)$ for an unknown truncation set $S$ that may hide up to an $\varepsilon$-fraction of the probability mass. For distributions with $p$-th directional moments of magnitude at most $ν_{P,p}$, truncation induces a bias of order $O(ν_{P,p}\varepsilon^{1-1/p})$. This bias creates a sharp information-theoretic detectability floor: when the signal $α$ falls below this threshold, the null and alternative hypotheses are indistinguishable even with infinite data. Above this floor, we prove that a simple second-order test achieving near-optimal sample complexity $n = O\!\left(\frac{\|Σ_P\|}{(α-4ν_{P,p}\varepsilon^{1-1/p})^2}\sqrt{d}\right)$. We further identify a structural escape from this finite-moment bias barrier. Under a directional median regularity assumption, truncation bias improves to linear order $O(\varepsilon)$. This reveals an intermediate regime in which estimation requires $Θ(d)$ samples for uniform recovery, while testing recovers the classical $Θ(\sqrt d)$ rate once truncation bias is eliminated. Together, our results provide a unified framework for mean testing under truncation, connecting finite-moment, sub-Gaussian, and median-regular structural regimes.

preprint2026arXiv

Permutation Inference under Multi-way Clustering and Missing Data

Econometric applications with multi-way clustering often feature a small number of effective clusters or heavy-tailed data, making standard cluster-robust and bootstrap inference unreliable in finite samples. In this paper, we develop a framework for finite-sample valid permutation inference in linear regression with multi-way clustering under an assumption of conditional exchangeability of the errors. Our assumption is closely related to the notion of separate exchangeability studied in earlier work, but can be more realistic in many economic settings as it imposes minimal restrictions on the covariate distribution. We construct permutation tests of significance that are valid in finite samples and establish theoretical power guarantees, in contrast to existing methods that are justified only asymptotically. We also extend our methodology to settings with missing data and derive power results that reveal phase transitions in detectability. Through simulation studies, we demonstrate that the proposed tests maintain correct size and competitive power, while standard cluster-robust and bootstrap procedures can exhibit substantial size distortions.

preprint2026arXiv

Rerandomization for quantile treatment effects

Although complete randomization is widely regarded as the gold standard for causal inference, covariate imbalance can still arise by chance in finite samples. Rerandomization has emerged as an effective tool to improve covariate balance across treatment groups and enhance the precision of causal effect estimation. While existing work focuses on average treatment effects, quantile treatment effects (QTEs) provide a richer characterization of treatment heterogeneity by capturing distributional shifts in outcomes, which is crucial for policy evaluation and equity-oriented research. In this article, we establish the asymptotic properties of the QTE estimator under rerandomization within a finite-population framework, without imposing any distributional or modeling assumptions on the covariates or outcomes.The estimator exhibits a non-Gaussian asymptotic distribution, represented as a linear combination of Gaussian and truncated Gaussian random variables. To facilitate inference, we propose a conservative variance estimator and construct corresponding confidence interval. Our theoretical analysis demonstrates that rerandomization improves efficiency over complete randomization under mild regularity conditions. Simulation studies further support the theoretical findings and illustrate the practical advantages of rerandomization for QTE estimation.

preprint2026arXiv

SAS-VPReID: A Scale-Adaptive Framework with Shape Priors for Video-based Person Re-Identification at Extreme Far Distances

Video-based Person Re-IDentification (VPReID) aims to retrieve the same person from videos captured by non-overlapping cameras. At extreme far distances, VPReID is highly challenging due to severe resolution degradation, drastic viewpoint variation and inevitable appearance noise. To address these issues, we propose a Scale-Adaptive framework with Shape Priors for VPReID, named SAS-VPReID. The framework is built upon three complementary modules. First, we deploy a Memory-Enhanced Visual Backbone (MEVB) to extract discriminative feature representations, which leverages the CLIP vision encoder and multi-proxy memory. Second, we propose a Multi-Granularity Temporal Modeling (MGTM) to construct sequences at multiple temporal granularities and adaptively emphasize motion cues across scales. Third, we incorporate Prior-Regularized Shape Dynamics (PRSD) to capture body structure dynamics. With these modules, our framework can obtain more discriminative feature representations. Experiments on the VReID-XFD benchmark demonstrate the effectiveness of each module and our final framework ranks the first on the VReID-XFD challenge leaderboard. The source code is available at https://github.com/YangQiWei3/SAS-VPReID.

preprint2026arXiv

SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Indoor scene synthesis underpins embodied AI, robotic manipulation, and simulation-based policy evaluation, where a useful scene must specify not only what the environment looks like, but also how its objects are structured. Existing pipelines, however, typically represent generated content as static meshes and inherit articulation only from curated asset libraries, which limits object-level controllability and prevents new interactable assets from being produced on demand. We address this gap by formulating physically interactable indoor scene synthesis as programmatic world generation, and present SceneCode, a framework that compiles a natural language prompt into an executable, code-driven indoor world rather than a collection of opaque meshes. A room-level agentic backbone first turns the prompt into a structured house layout and emits per-object AssetRequests through a planner--designer--critic loop. Each request is then routed to one of five code-generation strategies and converted into a synthesized part-wise Blender Python programs that are validated through an execution-guided repair-and-refine loop. The resulting programs are compiled into simulation-ready assets, and exported as SDF for physics simulation. A persistent scene-state registry links object requests, executable programs, rendered geometry, and simulation assets, turning scene assembly into a traceable and locally editable world-building process. We evaluate SceneCode across scene-level synthesis, object-level asset quality, human judgment, and downstream robot interaction. Results show that executable world programs improve prompt-faithful indoor scene generation and produce assets with cleaner mesh structure, and simulator-loadable articulation metadata. Project page: https://scene-code.github.io/.

preprint2026arXiv

VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models

Speech large language models (SpeechLLMs) have extended human-machine interactions from the text modality to the dynamic speech domain. Spoken dialogues convey diverse information, including semantic concepts, acoustic variations, paralanguage cues, and environmental context. However, existing evaluations of speech interaction models lack instances mimicking real scenarios and predominantly focus on the performance of distinct aspects, lacking a comprehensive comparison of critical capabilities between current routines. To address this gap, we propose VocalBench to assess the speech conversational abilities, comprising around 24k carefully curated instances of both English and Mandarin across four key dimensions - semantic quality, acoustic performance, conversational abilities, and robustness, covering 14 user-oriented characters. Experiments on 27 mainstream models reveal the common challenges for current routes, and highlight the need for new insights into next-generation speech interactive systems.

preprint2026arXiv

VReID-XFD: Video-based Person Re-identification at Extreme Far Distance Challenge Results

Person re-identification (ReID) across aerial and ground views at extreme far distances introduces a distinct operating regime where severe resolution degradation, extreme viewpoint changes, unstable motion cues, and clothing variation jointly undermine the appearance-based assumptions of existing ReID systems. To study this regime, we introduce VReID-XFD, a video-based benchmark and community challenge for extreme far-distance (XFD) aerial-to-ground person re-identification. VReID-XFD is derived from the DetReIDX dataset and comprises 371 identities, 11,288 tracklets, and 11.75 million frames, captured across altitudes from 5.8 m to 120 m, viewing angles from oblique (30 degrees) to nadir (90 degrees), and horizontal distances up to 120 m. The benchmark supports aerial-to-aerial, aerial-to-ground, and ground-to-aerial evaluation under strict identity-disjoint splits, with rich physical metadata. The VReID-XFD-25 Challenge attracted 10 teams with hundreds of submissions. Systematic analysis reveals monotonic performance degradation with altitude and distance, a universal disadvantage of nadir views, and a trade-off between peak performance and robustness. Even the best-performing SAS-PReID method achieves only 43.93 percent mAP in the aerial-to-ground setting. The dataset, annotations, and official evaluation protocols are publicly available at https://www.it.ubi.pt/DetReIDX/ .

preprint2026arXiv

W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search

Large Language Models (LLMs) demonstrate impressive capabilities, yet their outputs often suffer from misalignment with human preferences due to the inadequacy of weak supervision and a lack of fine-grained control. Training-time alignment methods like Reinforcement Learning from Human Feedback (RLHF) face prohibitive costs in expert supervision and inherent scalability limitations, offering limited dynamic control during inference. Consequently, there is an urgent need for scalable and adaptable alignment mechanisms. To address this, we propose W2S-AlignTree, a pioneering plug-and-play inference-time alignment framework that synergistically combines Monte Carlo Tree Search (MCTS) with the Weak-to-Strong Generalization paradigm for the first time. W2S-AlignTree formulates LLM alignment as an optimal heuristic search problem within a generative search tree. By leveraging weak model's real-time, step-level signals as alignment proxies and introducing an Entropy-Aware exploration mechanism, W2S-AlignTree enables fine-grained guidance during strong model's generation without modifying its parameters. The approach dynamically balances exploration and exploitation in high-dimensional generation search trees. Experiments across controlled sentiment generation, summarization, and instruction-following show that W2S-AlignTree consistently outperforms strong baselines. Notably, W2S-AlignTree raises the performance of Llama3-8B from 1.89 to 2.19, a relative improvement of 15.9 on the summarization task.

preprint2022arXiv

Data-driven Ranking and Selection under Input Uncertainty

We consider a simulation-based Ranking and Selection (R&S) problem with input uncertainty, where unknown input distributions can be estimated using input data arriving in batches of varying sizes over time. Each time a batch arrives, additional simulations can be run using updated input distribution estimates. The goal is to confidently identify the best design after collecting as few batches as possible. We first introduce a moving average estimator for aggregating simulation outputs generated under heterogenous input distributions. Then, based on a Sequential Elimination framework, we devise two major R&S procedures by establishing exact and asymptotic confidence bands for the estimator. In deriving the latter confidence bands, we incorporate the result of "Multiple Comparison with Best" and establish an asymptotic normality result which explicitly characterizes the tradeoff between input uncertainty and stochastic uncertainty in an online environment. We also extend our procedures to the indifference zone setting, which helps save simulation effort for practical usage. Numerical results show the effectiveness and necessity of our procedures. Moreover, the efficiency can be further boosted through optimizing the "drop rate" parameter of the estimator.

preprint2022arXiv

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For $T$ rounds, $K$ actions, and $d$-dimensional feature vectors, we prove a regret bound of $O((1+ρ+\frac{1}ρ) d\ln T \ln \frac{K}δ\sqrt{d K T^{1+2ε} \ln \frac{K}δ \frac{1}ε})$ that holds with probability $1-δ$ under the mean-variance criterion with risk tolerance $ρ$, for any $0<ε<\frac{1}{2}$, $0<δ<1$. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.

preprint2022arXiv

Variable Augmented Network for Invertible MR Coil Compression

A large number of coils are able to provide enhanced signal-to-noise ratio and improve imaging performance in parallel imaging. Nevertheless, the increasing growth of coil number simultaneously aggravates the drawbacks of data storage and reconstruction speed, especially in some iterative reconstructions. Coil compression addresses these issues by generating fewer virtual coils. In this work, a novel variable augmentation network for invertible coil compression termed VAN-ICC is presented. It utilizes inherent reversibility of normalizing flow-based models for high-precision compression and invertible recovery. By employing the variable augmentation technology to image/k-space variables from multi-coils, VAN-ICC trains invertible networks by finding an invertible and bijective function, which can map the original data to the compressed counterpart and vice versa. Experiments conducted on both fully-sampled and under-sampled data verified the effectiveness and flexibility of VAN-ICC. Quantitative and qualitative comparisons with traditional non-deep learning-based approaches demonstrated that VAN-ICC can carry much higher compression effects. Additionally, its performance is not susceptible to different number of virtual coils.

preprint2022arXiv

Virtual Coil Augmentation Technology for MR Coil Extrapolation via Deep Learning

Magnetic resonance imaging (MRI) is a widely used medical imaging modality. However, due to the limitations in hardware, scan time, and throughput, it is often clinically challenging to obtain high-quality MR images. In this article, we propose a method of using artificial intelligence to expand the channel to achieve the goal of generating the virtual coils. The main characteristic of our work is utilizing dummy variable technology to expand/extrapolate the receive coils in both image and k-space domains. The high-dimensional information formed by channel expansion is used as the prior information to improve the reconstruction effect of parallel imaging. Two main components are incorporated into the network design, namely variable augmentation technology and sum of squares (SOS) objective function. Variable augmentation provides the network with more high-dimensional prior information, which is helpful for the network to extract the deep feature information of the data. The SOS objective function is employed to solve the deficiency of k-space data training while speeding up convergence. Experimental results demonstrated its great potentials in super-resolution of MR images and accelerated parallel imaging reconstruction.

preprint2022arXiv

Wavelet Transform-assisted Adaptive Generative Modeling for Colorization

Unsupervised deep learning has recently demonstrated the promise of producing high-quality samples. While it has tremendous potential to promote the image colorization task, the performance is limited owing to the high-dimension of data manifold and model capability. This study presents a novel scheme that exploits the score-based generative model in wavelet domain to address the issues. By taking advantage of the multi-scale and multi-channel representation via wavelet transform, the proposed model learns the richer priors from stacked coarse and detailed wavelet coefficient components jointly and effectively. This strategy also reduces the dimension of the original manifold and alleviates the curse of dimensionality, which is beneficial for estimation and sampling. Moreover, dual consistency terms in the wavelet domain, namely data-consistency and structure-consistency are devised to leverage colorization task better. Specifically, in the training phase, a set of multi-channel tensors consisting of wavelet coefficients is used as the input to train the network with denoising score matching. In the inference phase, samples are iteratively generated via annealed Langevin dynamics with data and structure consistencies. Experiments demonstrated remarkable improvements of the proposed method on both generation and colorization quality, particularly in colorization robustness and diversity.

preprint2021arXiv

Observation of Aharonov-Bohm effect in PbTe nanowire networks

We report phase coherent electron transport in PbTe nanowire networks with a loop geometry. Magneto-conductance shows Aharonov-Bohm (AB) oscillations with periods of $h/e$ and $h/2e$ in flux. The amplitude of $h/2e$ oscillations is enhanced near zero magnetic field, possibly due to interference between time-reversal paths. Temperature dependence of the AB amplitudes suggests a phase coherence length $\sim$ 8 - 12 $μ$m at 50 mK. This length scale is larger than the typical geometry of PbTe-based hybrid semiconductor-superconductor nanowire devices.

preprint2020arXiv

Accelerated design of Fe-based soft magnetic materials using machine learning and stochastic optimization

Machine learning was utilized to efficiently boost the development of soft magnetic materials. The design process includes building a database composed of published experimental results, applying machine learning methods on the database, identifying the trends of magnetic properties in soft magnetic materials, and accelerating the design of next-generation soft magnetic nanocrystalline materials through the use of numerical optimization. Machine learning regression models were trained to predict magnetic saturation ($B_S$), coercivity ($H_C$) and magnetostriction ($λ$), with a stochastic optimization framework being used to further optimize the corresponding magnetic properties. To verify the feasibility of the machine learning model, several optimized soft magnetic materials -- specified in terms of compositions and thermomechanical treatments -- have been predicted and then prepared and tested, showing good agreement between predictions and experiments, proving the reliability of the designed model. Two rounds of optimization-testing iterations were conducted to search for better properties.

preprint2020arXiv

Causal Discovery from Incomplete Data: A Deep Learning Approach

As systems are getting more autonomous with the development of artificial intelligence, it is important to discover the causal knowledge from observational sensory inputs. By encoding a series of cause-effect relations between events, causal networks can facilitate the prediction of effects from a given action and analyze their underlying data generation mechanism. However, missing data are ubiquitous in practical scenarios. Directly performing existing casual discovery algorithms on partially observed data may lead to the incorrect inference. To alleviate this issue, we proposed a deep learning framework, dubbed Imputated Causal Learning (ICL), to perform iterative missing data imputation and causal structure discovery. Through extensive simulations on both synthetic and real data, we show that ICL can outperform state-of-the-art methods under different missing data mechanisms.

preprint2020arXiv

High-Dimensional Joint Estimation of Multiple Directed Gaussian Graphical Models

We consider the problem of jointly estimating multiple related directed acyclic graph (DAG) models based on high-dimensional data from each graph. This problem is motivated by the task of learning gene regulatory networks based on gene expression data from different tissues, developmental stages or disease states. We prove that under certain regularity conditions, the proposed $\ell_0$-penalized maximum likelihood estimator converges in Frobenius norm to the adjacency matrices consistent with the data-generating distributions and has the correct sparsity. In particular, we show that this joint estimation procedure leads to a faster convergence rate than estimating each DAG model separately. As a corollary, we also obtain high-dimensional consistency results for causal inference from a mix of observational and interventional data. For practical purposes, we propose \emph{jointGES} consisting of Greedy Equivalence Search (GES) to estimate the union of all DAG models followed by variable selection using lasso to obtain the different DAGs, and we analyze its consistency guarantees. The proposed method is illustrated through an analysis of simulated data as well as epithelial ovarian cancer gene expression data.

preprint2020arXiv

Learning High-dimensional Gaussian Graphical Models under Total Positivity without Adjustment of Tuning Parameters

We consider the problem of estimating an undirected Gaussian graphical model when the underlying distribution is multivariate totally positive of order 2 (MTP2), a strong form of positive dependence. Such distributions are relevant for example for portfolio selection, since assets are usually positively dependent. A large body of methods have been proposed for learning undirected graphical models without the MTP2 constraint. A major limitation of these methods is that their structure recovery guarantees in the high-dimensional setting usually require a particular choice of a tuning parameter, which is unknown a priori in real world applications. We here propose a new method to estimate the underlying undirected graphical model under MTP2 and show that it is provably consistent in structure recovery without adjusting the tuning parameters. This is achieved by a constraint-based estimator that infers the structure of the underlying graphical model by testing the signs of the empirical partial correlation coefficients. We evaluate the performance of our estimator in simulations and on financial data.

preprint2020arXiv

Learning in the Frequency Domain

Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

preprint2020arXiv

Permutation-Based Causal Structure Learning with Unknown Intervention Targets

We consider the problem of estimating causal DAG models from a mix of observational and interventional data, when the intervention targets are partially or completely unknown. This problem is highly relevant for example in genomics, since gene knockout technologies are known to have off-target effects. We characterize the interventional Markov equivalence class of DAGs that can be identified from interventional data with unknown intervention targets. In addition, we propose a provably consistent algorithm for learning the interventional Markov equivalence class from such data. The proposed algorithm greedily searches over the space of permutations to minimize a novel score function. The algorithm is nonparametric, which is particularly important for applications to genomics, where the relationships between variables are often non-linear and the distribution non-Gaussian. We demonstrate the performance of our algorithm on synthetic and biological datasets. Links to an implementation of our algorithm and to a reproducible code base for our experiments can be found at https://uhlerlab.github.io/causaldag/utigsp.