Source author record

Shujun Wang

Shujun Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.OC math.PR eess.IV

Catalog footprint

What is connected

12works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BrainAnytime: Anatomy-Aware Cross-Modal Pretraining for Brain Image Analysis with Arbitrary Modality Availability

Clinical diagnostic workups typically follow a modality escalation pathway: after initial clinical evaluation, clinicians begin with routine structural imaging (e.g., MRI), selectively add sequences such as FLAIR or T2 to refine the differential, and reserve molecular imaging (e.g., amyloid-PET) for cases that remain uncertain after standard evaluation. Consequently, patients are observed with heterogeneous and often incomplete modality subsets. However, most current AI models assume fixed data modalities as the model inputs. In this paper, we present BrainAnytime, a unified pretraining framework pretrained on 34,899 3D brain scans from five datasets that support brain image analysis under arbitrary modality availability spanning multi-sequence MRI and amyloid-PET. A single model accepts whatever imaging is available, from a lone T1 scan to a full multimodal workup. Pretraining learns structural-molecular correspondences between MRI and PET via cross-modal distillation (RCMD) and prioritizes disease-vulnerable anatomy via atlas-guided curriculum masking (PACM), all within a shared 3D masked autoencoder (Multi-MAE3D). Across four downstream tasks and five clinically motivated modality settings, BrainAnytime largely outperforms modality-specific models, missing-modality baselines, and large-scale brain MRI pretrained foundation models on most modality settings. Notably, it surpasses the strongest missing-modality baselines with relative improvements of 6.2% and 7.0% in average accuracy on CN vs. AD and CN vs. MCI classification, respectively. Code is available at https://github.com/SDH-Lab/BrainAnytime.

preprint2026arXiv

NEWTON: Agentic Planning for Physically Grounded Video Generation

Video generation models produce visually compelling results but systematically violate physical commonsense -- on VideoPhy-2, the best model achieves only 32.6% joint accuracy. We identify a specification bottleneck: text prompts are lossy compression of the physical world, omitting the parameters that fully determine dynamics, and no amount of model scaling can recover what was never specified. From this diagnosis we derive three properties that physics conditioning must satisfy -- sufficiency, dynamism, and verifiability -- and show that no existing approach satisfies all three. We present NEWTON, in which video generation is demoted from the system output to one action inside an agent's toolbox: a learned planner orchestrates physics-aware tools (keyframe generation, scientific computation, prompt refinement) to construct rich conditioning, and a verifier closes the loop for iterative re-planning. The planner is the sole trainable component, optimized on-policy via Flow-GRPO inside the live multi-turn loop. On VideoPhy-2, NEWTON improves joint accuracy from 21.4% to 29.7% on LTX-Video and from 30.7% to 37.4% on Veo-3.1, without modifying either generator. Our project page: https://Newton026.github.io/newton

preprint2026arXiv

PRA-PoE: Robust Alzheimer's Diagnosis with Arbitrary Missing Modalities

Missing modalities are prevalent in real-world Alzheimer's disease (AD) assessment and pose a significant challenge to multimodal learning, particularly when the distribution of observed modality subsets differs between training and deployment. Such missingness pattern mismatch induces a conditional representation shift across modality subsets. Existing approaches that rely on implicit imputation or modality synthesis often fail to explicitly model modality availability and uncertainty, leading to overconfident dependence on synthesized features, reduced robustness, and miscalibrated uncertainty estimates. To address these limitations, we propose PRA-PoE, an incomplete multimodal learning framework that is equipped with Prototype-anchored Representation Alignment (PRA) and an Uncertainty-aware Product of Experts (UA-PoE) fusion mechanism. First, PRA uses learnable global prototypes and availability-conditioned tokens to encode modality availability, distinguish observed from missing modalities, re-synthesize features for missing modalities, and adaptively refine observed representations to align latent spaces across modality subsets, with the goal of reducing representation shift under varying missingness patterns. Second, UA-PoE models each modality as a Gaussian expert and performs closed-form Product of Experts fusion, where experts with higher uncertainty are automatically down-weighted via lower precision, improving uncertainty reliability. We evaluate PRA-PoE under a clinically realistic protocol by training with naturally missing data and testing on all non-empty modality combinations. PRA-PoE consistently outperforms the state-of-the-art across datasets, achieving a 5.4% relative improvement in average accuracy on ADNI and a 10.9% relative gain in average F1 on OASIS-3 over the strongest baseline across all non-empty modality subsets.

preprint2025arXiv

Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation

Current state-of-the-art paradigms predominantly treat Text-to-Motion (T2M) generation as a direct translation problem, mapping symbolic language directly to continuous poses. While effective for simple actions, this System 1 approach faces a fundamental theoretical bottleneck we identify as the Semantic-Kinematic Impedance Mismatch: the inherent difficulty of grounding semantically dense, discrete linguistic intent into kinematically dense, high-frequency motion data in a single shot. In this paper, we argue that the solution lies in an architectural shift towards Latent System 2 Reasoning. Drawing inspiration from Hierarchical Motor Control in cognitive science, we propose Latent Motion Reasoning (LMR) that reformulates generation as a two-stage Think-then-Act decision process. Central to LMR is a novel Dual-Granularity Tokenizer that disentangles motion into two distinct manifolds: a compressed, semantically rich Reasoning Latent for planning global topology, and a high-frequency Execution Latent for preserving physical fidelity. By forcing the model to autoregressively reason (plan the coarse trajectory) before it moves (instantiates the frames), we effectively bridge the ineffability gap between language and physics. We demonstrate LMR's versatility by implementing it for two representative baselines: T2M-GPT (discrete) and MotionStreamer (continuous). Extensive experiments show that LMR yields non-trivial improvements in both semantic alignment and physical plausibility, validating that the optimal substrate for motion planning is not natural language, but a learned, motion-aligned concept space. Codes and demos can be found in \hyperlink{https://chenhaoqcdyq.github.io/LMR/}{https://chenhaoqcdyq.github.io/LMR/}

preprint2023arXiv

Linear-Quadratic Delayed Mean-Field Social Optimization

A linear quadratic (LQ) stochastic optimization problem with delay involving weakly-coupled large population is investigated in this paper. Different to classic mean field (MF) game, here agents cooperate with each other to minimize the so-called \emph{social} objective. With the aid of \emph{delayed person-by-person optimality} principle, one arrives at an auxiliary LQ delayed control problem by decentralized information. A decentralized strategy is obtained by feat of an MF type anticipated forward-backward stochastic differential delay equation (AFBSDDE) consistency condition. The discounting method with delay feature is employed to solve the consistency condition system. Finally, by some estimates of AFBSDDEs we derive the asymptotic social optimality.

preprint2021arXiv

Dual-Teacher++: Exploiting Intra-domain and Inter-domain Knowledge with Reliable Transfer for Cardiac Segmentation

Annotation scarcity is a long-standing problem in medical image analysis area. To efficiently leverage limited annotations, abundant unlabeled data are additionally exploited in semi-supervised learning, while well-established cross-modality data are investigated in domain adaptation. In this paper, we aim to explore the feasibility of concurrently leveraging both unlabeled data and cross-modality data for annotation-efficient cardiac segmentation. To this end, we propose a cutting-edge semi-supervised domain adaptation framework, namely Dual-Teacher++. Besides directly learning from limited labeled target domain data (e.g., CT) via a student model adopted by previous literature, we design novel dual teacher models, including an inter-domain teacher model to explore cross-modality priors from source domain (e.g., MR) and an intra-domain teacher model to investigate the knowledge beneath unlabeled target domain. In this way, the dual teacher models would transfer acquired inter- and intra-domain knowledge to the student model for further integration and exploitation. Moreover, to encourage reliable dual-domain knowledge transfer, we enhance the inter-domain knowledge transfer on the samples with higher similarity to target domain after appearance alignment, and also strengthen intra-domain knowledge transfer of unlabeled target data with higher prediction confidence. In this way, the student model can obtain reliable dual-domain knowledge and yield improved performance on target domain data. We extensively evaluated the feasibility of our method on the MM-WHS 2017 challenge dataset. The experiments have demonstrated the superiority of our framework over other semi-supervised learning and domain adaptation methods. Moreover, our performance gains could be yielded in bidirections,i.e., adapting from MR to CT, and from CT to MR.

preprint2020arXiv

Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for Annotation-efficient Cardiac Segmentation

Medical image annotations are prohibitively time-consuming and expensive to obtain. To alleviate annotation scarcity, many approaches have been developed to efficiently utilize extra information, e.g.,semi-supervised learning further exploring plentiful unlabeled data, domain adaptation including multi-modality learning and unsupervised domain adaptation resorting to the prior knowledge from additional modality. In this paper, we aim to investigate the feasibility of simultaneously leveraging abundant unlabeled data and well-established cross-modality data for annotation-efficient medical image segmentation. To this end, we propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher, where the student model not only learns from labeled target data (e.g., CT), but also explores unlabeled target data and labeled source data (e.g., MR) by two teacher models. Specifically, the student model learns the knowledge of unlabeled target data from intra-domain teacher by encouraging prediction consistency, as well as the shape priors embedded in labeled source data from inter-domain teacher via knowledge distillation. Consequently, the student model can effectively exploit the information from all three data resources and comprehensively integrate them to achieve improved performance. We conduct extensive experiments on MM-WHS 2017 dataset and demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance, outperforming semi-supervised learning and domain adaptation methods with a large margin.

preprint2020arXiv

Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization

The generalization capability of neural networks across domains is crucial for real-world applications. We argue that a generalized object recognition system should well understand the relationships among different images and also the images themselves at the same time. To this end, we present a new domain generalization framework that learns how to generalize across domains simultaneously from extrinsic relationship supervision and intrinsic self-supervision for images from multi-source domains. To be specific, we formulate our framework with feature embedding using a multi-task learning paradigm. Besides conducting the common supervised recognition task, we seamlessly integrate a momentum metric learning task and a self-supervised auxiliary task to collectively utilize the extrinsic supervision and intrinsic supervision. Also, we develop an effective momentum metric learning scheme with K-hard negative mining to boost the network to capture image relationship for domain generalization. We demonstrate the effectiveness of our approach on two standard object recognition benchmarks VLCS and PACS, and show that our methods achieve state-of-the-art performance.

preprint2015arXiv

A Class of Linear-Quadratic-Gaussian (LQG) Mean-Field Game (MFG) of Stochastic Delay Systems

This paper investigates the linear-quadratic-Gaussian (LQG) mean-field game (MFG) for a class of stochastic delay systems. We consider a large population system in which the dynamics of each player satisfies some forward stochastic differential delay equation (SDDE). The consistency condition or Nash certainty equivalence (NCE) principle is established through an auxiliary mean-field system of anticipated forward-backward stochastic differential equation with delay (AFBSDDE). The wellposedness of such consistency condition system can be further established by some continuation method instead the classical fixed-point analysis. Thus, the consistency condition maybe given on arbitrary time horizon. The decentralized strategies are derived which are shown to satisfy the $ε$-Nash equilibrium property. Two special cases of our MFG for delayed system are further investigated.

preprint2014arXiv

A Class of Mean-field LQG Games with Partial Information

The large-population system consists of considerable small agents whose individual behavior and mass effect are interrelated via their state-average. The mean-field game provides an efficient way to get the decentralized strategies of large-population system when studying its dynamic optimizations. Unlike other large-population literature, this current paper possesses the following distinctive features. First, our setting includes the partial information structure of large-population system which is practical from real application standpoint. Specially, two cases of partial information structure are considered here: the partial filtration case (see Section 2, 3) where the available information to agents is the filtration generated by an observable component of underlying Brownian motion; the noisy observation case (Section 4) where the individual agent can access an additive white-noise observation on its own state. Also, it is new in filtering modeling that our sensor function may depend on the state-average. Second, in both cases, the limiting state-averages become random and the filtering equations to individual state should be formalized to get the decentralized strategies. Moreover, it is also new that the limit average of state filters should be analyzed here. This makes our analysis very different to the full information arguments of large-population system. Third, the consistency conditions are equivalent to the wellposedness of some Riccati equations, and do not involve the fixed-point analysis as in other mean-field games. The $ε$-Nash equilibrium properties are also presented.

preprint2014arXiv

Mean Field Linear-Quadratic-Gaussian (LQG) Games of Forward-Backward Stochastic Differential Equations

This paper studies a new class of dynamic optimization problems of large-population (LP) system which consists of a large number of negligible and coupled agents. The most significant feature in our setup is the dynamics of individual agents follow the forward-backward stochastic differential equations (FBSDEs) in which the forward and backward states are coupled at the terminal time. This current paper is hence different to most existing large-population literature where the individual states are typically modeled by the SDEs including the forward state only. The associated mean-field linear-quadratic-Gaussian (LQG) game, in its forward-backward sense, is also formulated to seek the decentralized strategies. Unlike the forward case, the consistency conditions of our forward-backward mean-field games involve six Riccati and force rate equations. Moreover, their initial and terminal conditions are mixed thus some special decoupling technique is applied here. We also verify the $ε$-Nash equilibrium property of the derived decentralized strategies. To this end, some estimates to backward stochastic system are employed. In addition, due to the adaptiveness requirement to forward-backward system, our arguments here are not parallel to those in its forward case.

preprint2014arXiv

Mean Field Linear-Quadratic-Gaussian (LQG) Games: Major and Minor Players

This paper is concerned with a backward-forward stochastic differential equation (BFSDE) system, in which a large number of negligible agents are coupled in their dynamics via state average. Here some BSDE is introduced as the dynamics of major player, while dynamics of minor players are described by SDEs. Some auxiliary mean-field SDEs (MFSDEs) and a $3\times2$ mixed forward-backward stochastic differential equation (FBSDE) system are considered and analyzed instead of involving the fixed-point analysis as in other mean-field games. We also derive the decentralized strategies which are shown to satisfy the $ε$-Nash equilibrium property.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint