Researcher profile

Yihan Wang

Yihan Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Beyond the Silence: How Men Navigate Infertility Through Digital Communities and Data Sharing

Men experiencing infertility face unique challenges navigating Traditional Masculinity Ideologies that discourage emotional expression and help-seeking. This study examines how Reddit's r/maleinfertility community helps overcome these barriers through digital support networks. Using topic modeling (115 topics), network analysis (11 micro-communities), and time-lagged regression on 11,095 posts and 79,503 comments from 8,644 users, we found the community functions as a hybrid space: informal diagnostic hub, therapeutic commons, and governed institution. Medical advice dominates discourse (63.3\%), while emotional support (7.4\%) and moderation (29.2\%) create essential infrastructure. Sustained engagement correlates with actionable guidance and affiliation language, not emotional processing. Network analysis revealed structurally cohesive but topically diverse clusters without echo chamber characteristics. Cross-posters (20\% of users) who bridge r/maleinfertility and the gender-mixed r/infertility community serve as navigators and mentors, transferring knowledge between spaces. These findings inform trauma-informed design for stigmatized health communities, highlighting role-aware systems and navigation support.

preprint2026arXiv

Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion

In autonomous driving, camera-radar fusion offers complementary sensing and low deployment cost. Existing methods perform fusion through input mixing, feature map mixing, or query-based feature sampling. We propose a new fusion paradigm, termed heterogeneous query interaction, and present ConFusion, a camera-radar 3D object detector. ConFusion combines image queries, radar queries, and learnable world queries distributed in 3D space to improve query initialization and object coverage. To encourage cross-type interaction among heterogeneous queries, we introduce heterogeneous query mixing (QMix), which performs dedicated cross-type attention after feature sampling to consolidate complementary object evidence. We further propose interactive query swap sampling (QSwap), which improves feature sampling by allowing related queries to exchange informative feature tokens under attention and geometric constraints. Experiments on the nuScenes dataset show that ConFusion achieves state-of-the-art performance, reaching 59.1 mAP and 65.6 NDS on the validation set, and 61.6 mAP and 67.9 NDS on the test set.

preprint2026arXiv

Ministral 3

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

preprint2026arXiv

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

Cross-embodiment video generation aims to transfer motions across different humanoid embodiments, such as human-to-robot and robot-to-robot, enabling scalable data generation for embodied intelligence. A major challenge in this setting is that motion dynamics are partly transferable across embodiments, whereas appearance and morphology remain embodiment-specific. Existing approaches often entangle these factors, and many require paired data for every target embodiment, which limits scalability to new robots. We present OmniHumanoid, a framework that factorizes transferable motion learning and embodiment-specific adaptation. Our method learns a shared motion transfer model from motion-aligned paired videos spanning multiple embodiments, while adapting to a new embodiment using only unpaired videos through lightweight embodiment-specific adapters. To reduce interference between motion transfer and embodiment adaptation, we further introduce a branch-isolated attention design that separates motion conditioning from embodiment-specific modulation. In addition, we construct a synthetic cross-embodiment dataset with motion-aligned paired videos rendered across diverse humanoid assets, scenes, and viewpoints. Experiments on both synthetic and real-world benchmarks show that OmniHumanoid achieves strong motion fidelity and embodiment consistency, while enabling scalable adaptation to unseen humanoid embodiments without retraining the shared motion model.

preprint2026arXiv

ResMAS: Resilience Optimization in LLM-based Multi-agent Systems

Large Language Model-based Multi-Agent Systems (LLM-based MAS), where multiple LLM agents collaborate to solve complex tasks, have shown impressive performance in many areas. However, MAS are typically distributed across different devices or environments, making them vulnerable to perturbations such as agent failures. While existing works have studied the adversarial attacks and corresponding defense strategies, they mainly focus on reactively detecting and mitigating attacks after they occur rather than proactively designing inherently resilient systems. In this work, we study the resilience of LLM-based MAS under perturbations and find that both the communication topology and prompt design significantly influence system resilience. Motivated by these findings, we propose ResMAS: a two-stage framework for enhancing MAS resilience. First, we train a reward model to predict the MAS's resilience, based on which we train a topology generator to automatically design resilient topology for specific tasks through reinforcement learning. Second, we introduce a topology-aware prompt optimization method that refines each agent's prompt based on its connections and interactions with other agents. Extensive experiments across a range of tasks show that our approach substantially improves MAS resilience under various constraints. Moreover, our framework demonstrates strong generalization ability to new tasks and models, highlighting its potential for building resilient MASs.

preprint2026arXiv

SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution

Visual prediction has emerged as a promising paradigm for embodied control, where future observations are generated and then translated into actions. However, dense video generation is computationally expensive and often unnecessary for many manipulation tasks, whose progress can be summarized by a small number of task-relevant visual states. In this work, we study whether image editing models can serve as sparse visual world models for robot manipulation by predicting task-level future states without dense video rollout. We first conduct a controlled comparison between the video generation model Wan2.2 and the image editing model FLUX-Kontext under the same robotic data setting, and find that image editing produces more reliable task-level keyframes with better visual fidelity and substantially lower inference cost. Motivated by this observation, we propose SWEET, a one-shot sparse visual planning framework that progressively generates a sequence of task-relevant manipulation keyframes through successive image editing, conditioned on language instructions and optional arrow-based spatial guidance. A goal-conditioned diffusion action predictor then converts adjacent imagined keyframes into executable action chunks. To reduce the mismatch between real and edited visual subgoals, we further introduce a mixed-training strategy with filtered edited targets. Experiments on DROID and RoboMimic show that SWEET improves keyframe prediction across seen and unseen scenes and enables a full pipeline from sequential keyframe planning to executable robot actions, suggesting that image editing is a promising and underexplored direction for embodied visual prediction.

preprint2024arXiv

Data-Dependent Stability Analysis of Adversarial Training

Stability analysis is an essential aspect of studying the generalization ability of deep learning, as it involves deriving generalization bounds for stochastic gradient descent-based training algorithms. Adversarial training is the most widely used defense against adversarial example attacks. However, previous generalization bounds for adversarial training have not included information regarding the data distribution. In this paper, we fill this gap by providing generalization bounds for stochastic gradient descent-based adversarial training that incorporate data distribution information. We utilize the concepts of on-average stability and high-order approximate Lipschitz conditions to examine how changes in data distribution and adversarial budget can affect robust generalization gaps. Our derived generalization bounds for both convex and non-convex losses are at least as good as the uniform stability-based counterparts which do not include data distribution information. Furthermore, our findings demonstrate how distribution shifts from data poisoning attacks can impact robust generalization.

preprint2022arXiv

Adversarial Parameter Attack on Deep Neural Networks

In this paper, a new parameter perturbation attack on DNNs, called adversarial parameter attack, is proposed, in which small perturbations to the parameters of the DNN are made such that the accuracy of the attacked DNN does not decrease much, but its robustness becomes much lower. The adversarial parameter attack is stronger than previous parameter perturbation attacks in that the attack is more difficult to be recognized by users and the attacked DNN gives a wrong label for any modified sample input with high probability. The existence of adversarial parameters is proved. For a DNN $F_Θ$ with the parameter set $Θ$ satisfying certain conditions, it is shown that if the depth of the DNN is sufficiently large, then there exists an adversarial parameter set $Θ_a$ for $Θ$ such that the accuracy of $F_{Θ_a}$ is equal to that of $F_Θ$, but the robustness measure of $F_{Θ_a}$ is smaller than any given bound. An effective training algorithm is given to compute adversarial parameters and numerical experiments are used to demonstrate that the algorithms are effective to produce high quality adversarial parameters.

preprint2022arXiv

Close Encounters of Stars with Stellar-mass Black Hole Binaries

Many astrophysical environments, from star clusters and globular clusters to the disks of Active Galactic Nuclei, are characterized by frequent interactions between stars and the compact objects that they leave behind. Here, using a suite of $3-D$ hydrodynamics simulations, we explore the outcome of close interactions between $1M_{\odot}$ stars and binary black holes (BBHs) in the gravitational wave regime, resulting in a tidal disruption event (TDE) or a pure scattering, focusing on the accretion rates, the back reaction on the BH binary orbital parameters and the increase in the binary BH effective spin. We find that TDEs can make a significant impact on the binary orbit, which is often different from that of pure scattering. Binaries experiencing a prograde (retrograde) TDE tend to be widened (hardened) by up to $\simeq 20\%$. Initially circular binaries become more eccentric by $\lesssim 10\%$ by a prograde or retrograde TDE, whereas the eccentricity of initially eccentric binaries increases (decreases) by a retrograde (prograde) TDE by $\lesssim 5\%$. Overall a single TDE can generally result in changes of the gravitational wave-driven merger time scale by order unity. The accretion rates of both black holes are very highly super-Eddington, showing modulations (preferentially for retrograde TDEs) on a time scale of the orbital period, which can be a characteristic feature of BBH-driven TDEs. Prograde TDEs result in the effective spin parameter $χ$ to vary by $\lesssim 0.02$ while $χ\gtrsim -0.005$ for retrograde TDEs.

preprint2022arXiv

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Pose estimation plays a critical role in human-centered vision applications. However, it is difficult to deploy state-of-the-art HRNet-based pose estimation models on resource-constrained edge devices due to the high computational cost (more than 150 GMACs per frame). In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. We reveal that HRNet's high-resolution branches are redundant for models at the low-computation region via our gradual shrinking experiments. Removing them improves both efficiency and performance. Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs. Fusion Deconv Head removes the redundancy in high-resolution branches, allowing scale-aware feature fusion with low overhead. Large Kernel Convs significantly improve the model's capacity and receptive field while maintaining a low computational cost. With only 25% computation increment, 7x7 kernels achieve +14.0 mAP better than 3x3 kernels on the CrowdPose dataset. On mobile platforms, LitePose reduces the latency by up to 5.0x without sacrificing performance, compared with prior state-of-the-art efficient pose estimation models, pushing the frontier of real-time multi-person pose estimation on edge. Our code and pre-trained models are released at https://github.com/mit-han-lab/litepose.

preprint2022arXiv

On the Convergence of Certified Robust Training with Interval Bound Propagation

Interval Bound Propagation (IBP) is so far the base of state-of-the-art methods for training neural networks with certifiable robustness guarantees when potential adversarial perturbations present, while the convergence of IBP training remains unknown in existing literature. In this paper, we present a theoretical analysis on the convergence of IBP training. With an overparameterized assumption, we analyze the convergence of IBP robust training. We show that when using IBP training to train a randomly initialized two-layer ReLU neural network with logistic loss, gradient descent can linearly converge to zero robust training error with a high probability if we have sufficiently small perturbation radius and large network width.

preprint2022arXiv

The emergence of diffused Gamma-Ray Burst afterglows from the disks of Active Galactic Nuclei

The disks of Active Galactic Nuclei (AGNs) have emerged as rich environments for the production and capture of stars and the compact objects that they leave behind. These stars produce long Gamma-Ray Bursts (LGRBs) at their deaths, while frequent interactions among compact objects form binary neutron stars and neutron star-black hole binaries, leading to short GRBs (SGRBs) upon their merger. Predicting the properties of these transients as they emerge from the dense environments of AGN disks is key to their proper identification and to better constrain the star and compact object population in AGN disks. Some of these transients would appear unusual because they take place in much higher densities than the interstellar medium. Others, which are the subject of this paper, would additionally be modified by radiation diffusion since they are generated within optically thick regions of the accretion disks. Here we compute the GRB afterglow light curves for diffused GRB sources for a representative variety of central black-hole masses and disk locations. We find that the radiation from radio to UV and soft X-rays can be strongly suppressed by synchrotron self-absorption in the dense medium of the AGN disk. In addition, photon diffusion can significantly delay the emergence of the emission peak, turning a beamed, fast transient into a slow, isotropic, and dimmer one. These would appear as broadband-correlated AGN variability with dominance at the higher frequencies. Their properties can constrain both the stellar populations within AGN disks as well as the disk structure.

preprint2021arXiv

Symmetry Breaking in Dynamical Encounters in the Disks of Active Galactic Nuclei

Active galactic nucleus (AGN) disks may be important sites of binary black hole (BBH) mergers. Here we show via numerical experiments with the high-accuracy, high precision code {\tt SpaceHub} that broken symmetry in dynamical encounters in AGN disks can lead to an asymmetry between prograde and retrograde BBH mergers. The direction of the hardening asymmetry depends on the initial binary semi-major axis. An asymmetric distribution of mass-weighted projected spin $χ_{\rm eff}$ should therefore be expected in LIGO-Virgo detections of BBH mergers from AGN disks. This channel further predicts that negative $χ_{\rm eff}$ BBH mergers are most likely for massive binaries.