Researcher profile

Yiran Wang

Yiran Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Apollo: Unified Multi-Task Audio-Video Joint Generation

Audio-video joint generation has progressed rapidly, yet substantial challenges still remain. Non-commercial approaches still suffer audio-visual asynchrony, poor lip-speech alignment, and unimodal degradation, which can be stemmed from weak audio-visual correspondence modeling, limited generalization, and scarce high-quality dense-caption data. To address these issues, we introduce Apollo and delve into three axes--model architecture, training strategy, and data curation. Architecturally, we adopt a single-tower design with unified DiT blocks and an Omni-Full Attention mechanism, achieving tight audio-visual alignment and strong scalability. Training-wise, we adopt a progressive multitask regime--random modality masking to joint optimization across tasks, and a multistage curriculum, yielding robust representations, strengthening A-V aligned world knowledge, and preventing unimodal collapse. For datasets, we present the first large-scale audio-video dataset with dense captions, and introduce a novel automated data-construction pipeline which annotates and filters millions of diverse, high-quality, strictly aligned audio-video-caption triplets. Building on this, Apollo scales to large datasets, delivering high-fidelity, semantically and temporally aligned, instruction-following generation in both joint and unimodal settings while generalizing robustly to out-of-distribution scenarios. Across tasks, it substantially outperforms prior methods by a large margin and achieves performance comparable to Veo 3, offering a unified, scalable path toward next-generation audio-video synthesis.

preprint2026arXiv

From Simple to Complex: Curriculum-Guided Physics-Informed Neural Networks via Gaussian Mixture Models

Physics-informed neural networks (PINNs) offer a mesh-free framework for solving partial differential equations (PDEs), yet training often suffers from gradient pathologies, spectral bias, and poor convergence, especially for problems with strong nonlinearity, sharp gradients, or multiscale features. We propose the Curriculum-Guided Gaussian Mixture Physics-Informed Neural Network (CGMPINN), which integrates Gaussian mixture modeling with dynamic curriculum learning. Specifically, a GMM is periodically fitted to the PDE residual distribution to quantify spatially varying learning difficulty. A smooth curriculum schedule progressively shifts training focus from easy to harder regions, while precision-based variance modulation suppresses unreliable clusters during early optimization. This dual curriculum is governed by a shared curriculum parameter and can be combined with self-adaptive loss balancing. We further establish theoretical guarantees, including sublinear convergence of the gradient norm for the induced time-varying loss, uniform equivalence between the curriculum-weighted and standard PDE losses, and a generalization bound with an explicit weighting-induced bias characterization. Experiments on six benchmark PDEs spanning elliptic, parabolic, hyperbolic, advection-dominated, and nonlinear reaction-diffusion types show that CGMPINN consistently achieves the lowest relative $L_2$ and maximum absolute errors among all compared methods, reducing relative $L_2$ error by up to 97.8\% over the standard PINN at comparable cost. Our code is publicly available at https://github.com/Mathematics-Yang/CGMPINN.

preprint2026arXiv

MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering

Visual Question Answering (VQA) requires models to reason over multimodal information, combining visual and textual data. With the development of continual learning, significant progress has been made in retaining knowledge and adapting to new information in the VQA domain. However, current methods often struggle with balancing knowledge retention, adaptation, and robust feature representation. To address these challenges, we propose a novel framework with adaptive memory allocation and global noise filtering called MacVQA for visual question answering. MacVQA fuses visual and question information while filtering noise to ensure robust representations, and employs prototype-based memory allocation to optimize feature quality and memory usage. These designs enable MacVQA to balance knowledge acquisition, retention, and compositional generalization in continual VQA learning. Experiments on ten continual VQA tasks show that MacVQA outperforms existing baselines, achieving 43.38% average accuracy and 2.32% average forgetting on standard tasks, and 42.53% average accuracy and 3.60% average forgetting on novel composition tasks.

preprint2026arXiv

SafeMo: Linguistically Grounded Unlearning for Trustworthy Text-to-Motion Generation

Text-to-motion (T2M) generation with diffusion backbones achieves strong realism and alignment. Safety concerns in T2M methods have been raised in recent years; existing methods replace discrete VQ-VAE codebook entries to steer the model away from unsafe behaviors. However, discrete codebook replacement-based methods have two critical flaws: firstly, replacing codebook entries which are reused by benign prompts leads to drifts on everyday tasks, degrading the model's benign performance; secondly, discrete token-based methods introduce quantization and smoothness loss, resulting in artifacts and jerky transitions. Moreover, existing text-to-motion datasets naturally contain unsafe intents and corresponding motions, making them unsuitable for safety-driven machine learning. To address these challenges, we propose SafeMo, a trustworthy motion generative framework integrating Minimal Motion Unlearning (MMU), a two-stage machine unlearning strategy, enabling safe human motion generation in continuous space, preserving continuous kinematics without codebook loss and delivering strong safety-utility trade-offs compared to current baselines. Additionally, we present the first safe text-to-motion dataset SafeMoVAE-29K integrating rewritten safe text prompts and continuous refined motion for trustworthy human motion unlearning. Built upon DiP, SafeMo efficiently generates safe human motions with natural transitions. Experiments demonstrate effective unlearning performance of SafeMo by showing strengthened forgetting on unsafe prompts, reaching 2.5x and 14.4x higher forget-set FID on HumanML3D and Motion-X respectively, compared to the previous SOTA human motion unlearning method LCR, with benign performance on safe prompts being better or comparable. Code: https://github.com/AIGeeksGroup/SafeMo. Website: https://aigeeksgroup.github.io/SafeMo.

preprint2026arXiv

TriDE: Triangle-Consistent Translation Directions for Global Camera Pose Estimation

Pairwise translation directions are a key input to camera location estimation in global structure-from-motion. Existing estimators usually process each image pair independently, producing directions that may be locally plausible but inconsistent with the other relative directions in the viewing graph. To jointly estimate the direction, we propose TriDE, which exploits camera-triangle consistency as an efficient higher-order verification signal. Instead of solving a costly global nonlinear optimization problem that is sensitive to initialization, TriDE refines unreliable pairwise directions through message passing between directions and their incident weighted triangles. This information propagation strategy enables us to establish a strong phase-transition bound for exact recovery under a realistic random corruption model. Experiments on real image graphs show that TriDE improves direction accuracy by a large margin and yields better downstream camera locations, providing a practical link between local pairwise estimation and global camera pose geometry.

preprint2022arXiv

A deep learning based reduced order modeling for stochastic underground flow problems

In this paper, we propose a deep learning based reduced order modeling method for stochastic underground flow problems in highly heterogeneous media. We aim to utilize supervised learning to build a reduced surrogate model from the stochastic parameter space that characterizes the possible highly heterogeneous media to the solution space of a stochastic flow problem to have fast online simulations. Dominant POD modes obtained from a well-designed spectral problem in a global snapshot space are used to represent the solution of the flow problem. Due to the small dimension of the solution, the complexity of the neural network is significantly reduced. We adopt the generalized multiscale finite element method (GMsFEM), in which a set of local multiscale basis functions that can capture the heterogeneity of the media and source information are constructed to efficiently generate globally defined snapshot space. Rigorous theoretical analyses are provided and extensive numerical experiments for linear and nonlinear stochastic flows are provided to verify the superior performance of the proposed method.

preprint2022arXiv

Less is More: Consistent Video Depth Estimation with Masked Frames Modeling

Temporal consistency is the key challenge of video depth estimation. Previous works are based on additional optical flow or camera poses, which is time-consuming. By contrast, we derive consistency with less information. Since videos inherently exist with heavy temporal redundancy, a missing frame could be recovered from neighboring ones. Inspired by this, we propose the frame masking network (FMNet), a spatial-temporal transformer network predicting the depth of masked frames based on their neighboring frames. By reconstructing masked temporal features, the FMNet can learn intrinsic inter-frame correlations, which leads to consistency. Compared with prior arts, experimental results demonstrate that our approach achieves comparable spatial accuracy and higher temporal consistency without any additional information. Our work provides a new perspective on consistent video depth estimation. Our official project page is https://github.com/RaymondWang987/FMNet.

preprint2021arXiv

A local-global generalized multiscale finite element method for highly heterogeneous stochastic groundwater flow problems

In this paper, we propose a local-global multiscale method for highly heterogeneous stochastic groundwater flow problems under the framework of reduced basis method and the generalized multiscale finite element method (GMsFEM). Due to incomplete characterization of the medium properties of the groundwater flow problems, random variables are used to parameterize the uncertainty. As a result, solving the problem repeatedly is required to obtain statistical quantities. Besides, the medium properties are usually highly heterogeneous, which will result in a large linear system that needs to be solved. Therefore, it is intrinsically inevitable to seek a computational-efficient model reduction method to overcome the difficulty. We will explore the combination of the reduced basis method and the GMsFEM. In particular, we will use residual-driven basis functions, which are key ingredients in GMsFEM. This local-global multiscale method is more efficient than applying the GMsFEM or reduced basis method individually. We first construct parameter-independent multiscale basis functions that include both local and global information of the permeability fields, and then use these basis functions to construct several global snapshots and global basis functions for fast online computation with different parameter inputs. We provide rigorous analysis of the proposed method and extensive numerical examples to demonstrate the accuracy and efficiency of the local-global multiscale method.

preprint2021arXiv

Some integral geometry problems for wave equations

We consider the Cauchy problem and the source problem for normally hyperbolic operators on the Minkowski spacetime, and study the determination of solutions from their integrals along null geodesics. For the Cauchy problem, we give a new proof of the stable determination result obtained in Vasy and Wang [12]. For the source problem, we obtain stable determination for sources with space-like singularities. Our proof is based on the microlocal analysis of the normal operator of the light ray transform composed with the parametrix for strictly hyperbolic operators.

preprint2020arXiv

Adaptive multiscale model reduction for nonlinear parabolic equations using GMsFEM

In this paper, we propose a coupled Discrete Empirical Interpolation Method (DEIM) and Generalized Multiscale Finite element method (GMsFEM) to solve nonlinear parabolic equations with application to the Allen-Cahn equation. The Allen-Cahn equation is a model for nonlinear reaction-diffusion process. It is often used to model interface motion in time, e.g. phase separation in alloys. The GMsFEM allows solving multiscale problems at a reduced computational cost by constructing a reduced-order representation of the solution on a coarse grid. In arXiv:1301.2866, it was shown that the GMsFEM provides a flexible tool to solve multiscale problems by constructing appropriate snapshot, offline and online spaces. In this paper, we solve a time dependent problem, where online enrichment is used. The main contribution is comparing different online enrichment methods. More specifically, we compare uniform online enrichment and adaptive methods. We also compare two kinds of adaptive methods. Furthermore, we use DEIM, a dimension reduction method to reduce the complexity when we evaluate the nonlinear terms. Our results show that DEIM can approximate the nonlinear term without significantly increasing the error. Finally, we apply our proposed method to the Allen Cahn equation.

preprint2020arXiv

Online conservative generalized multiscale finite element method for flow models

In this paper, we consider an online enrichment procedure using the Generalized Multiscale Finite Element Method (GMsFEM) in the context of a two-phase flow model in heterogeneous porous media. The coefficient of the elliptic equation is referred to as the permeability and is the main source of heterogeneity within the model. The elliptic pressure equation is solved using online GMsFEM, and is coupled with a hyperbolic transport equation where local conservation of mass is necessary. To satisfy the conservation property, we aim at constructing conservative fluxes within the space of multiscale basis functions through the use of a postprocessing technique. In order to improve the accuracy of the pressure and velocity solutions in the online GMsFEM we apply a systematic online enrichment procedure. The increase in pressure accuracy due to the online construction is inherited by the conservative flux fields and the desired saturation solutions from the coupled transport equation. Despite the fact that the coefficient of the pressure equation is dependent on the saturation which may vary in time, we may construct an approximation space using the initial coefficient where no further basis updates follow. Numerical results corresponding to four different types of heterogeneous permeability coefficients are exhibited to test the proposed methodology.

preprint2020arXiv

Two-sample Testing on Latent Distance Graphs With Unknown Link Functions

We propose a valid and consistent test for the hypothesis that two latent distance random graphs on the same vertex set have the same generating latent positions, up to some unidentifiable similarity transformations. Our test statistic is based on first estimating the edge probabilities matrices by truncating the singular value decompositions of the averaged adjacency matrices in each population and then computing a Spearman rank correlation coefficient between these estimates. Experimental results on simulated data indicate that the test procedure has power even when there is only one sample from each population, provided that the number of vertices is not too small. Application on a dataset of neural connectome graphs showed that we can distinguish between scans from different age groups while application on a dataset of epileptogenic recordings showed that we can discriminate between seizure and non-seizure events.

preprint2019arXiv

Singularities generated by the triple interaction of semilinear conormal waves

We study the local propagation of conormal singularities for solutions of semilinear wave equations $\square u = P(y, u)$, where $P(y, u)$ is a polynomial of degree $N \geq 3$ in $u$ with $C^\infty(\mathbb{R}^3_y)$ coefficients. We know from the work of Melrose & Ritter and Bony that if u is conormal to three waves which intersect transversally at point $q$, then after the triple interaction $u(y)$ is a conormal distribution with respect to the three waves and the characteristic cone $Q$ with vertex at $q$. We compute the principal symbol of $u$ at the cone and away from the hypersurfaces. We show that if $\partial_u^3 P (q, u(q)) \neq 0$, $u$ is an ellipitic conormal distribution.