Researcher profile

Qing Qu

Qing Qu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing

Data assimilation (DA) estimates the state of an evolving dynamical system from noisy, partial observations, and is widely used in scientific simulation as well as weather and climate science. In practice, filtering methods rely on frame-to-frame transition models. However, these models are fragile when observations are non-Markovian (when they form only a partial slice of a higher-dimensional latent state as in real-world weather data): they tend to accumulate errors over long horizons. At the same time, learned DA methods typically commit to a single regime, either filtering (nowcasting, real-time forecasting) or smoothing (retrospective reanalysis), which splits what should be a shared prior across application-specific pipelines. To address both issues, we introduce ForcingDAS, a unified and robust DA framework. Built on Diffusion Forcing with an independent noise level assigned to each frame, ForcingDAS learns a joint-trajectory prior instead of frame-to-frame transitions. This allows it to capture long-horizon temporal dependencies and reduce error accumulation. In addition, the same trained model spans the full filtering to smoothing spectrum at inference time. Specifically, nowcasting, fixed-lag smoothing, and batch reanalysis are selected through the inference schedule alone, without retraining. We evaluate ForcingDAS on 2D Navier-Stokes vorticity, precipitation nowcasting, and global atmospheric state estimation. Across all settings, a single model is competitive with or outperforms both learned and classical baselines that are specialized for individual regimes, with the largest gains observed on real-world weather benchmarks.

preprint2022arXiv

Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets). In particular, many recent works demonstrate that promoting the orthogonality of the weights helps train deep models and improve robustness. For ConvNets, most existing methods are based on penalizing or normalizing weight matrices derived from concatenating or flattening the convolutional kernels. These methods often destroy or ignore the benign convolutional structure of the kernels; therefore, they are often expensive or impractical for deep ConvNets. In contrast, we introduce a simple and efficient "Convolutional Normalization" (ConvNorm) method that can fully exploit the convolutional structure in the Fourier domain and serve as a simple plug-and-play module to be conveniently incorporated into any ConvNets. Our method is inspired by recent work on preconditioning methods for convolutional sparse coding and can effectively promote each layer's channel-wise isometry. Furthermore, we show that our ConvNorm can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network, leading to easier training and improved robustness for deep ConvNets. Applied to classification under noise corruptions and generative adversarial network (GAN), we show that the ConvNorm improves the robustness of common ConvNets such as ResNet and the performance of GAN. We verify our findings via numerical experiments on CIFAR and ImageNet.

preprint2022arXiv

From Symmetry to Geometry: Tractable Nonconvex Problems

As science and engineering have become increasingly data-driven, the role of optimization has expanded to touch almost every stage of the data analysis pipeline, from signal and data acquisition to modeling and prediction. The optimization problems encountered in practice are often nonconvex. While challenges vary from problem to problem, one common source of nonconvexity is nonlinearity in the data or measurement model. Nonlinear models often exhibit symmetries, creating complicated, nonconvex objective landscapes, with multiple equivalent solutions. Nevertheless, simple methods (e.g., gradient descent) often perform surprisingly well in practice. The goal of this survey is to highlight a class of tractable nonconvex problems, which can be understood through the lens of symmetries. These problems exhibit a characteristic geometric structure: local minimizers are symmetric copies of a single "ground truth" solution, while other critical points occur at balanced superpositions of symmetric copies of the ground truth, and exhibit negative curvature in directions that break the symmetry. This structure enables efficient methods to obtain global minimizers. We discuss examples of this phenomenon arising from a wide range of problems in imaging, signal processing, and data analysis. We highlight the key role of symmetry in shaping the objective landscape and discuss the different roles of rotational and discrete symmetries. This area is rich with observed phenomena and open problems; we close by highlighting directions for future research.

preprint2022arXiv

Highly active hydrogen evolution facilitated by topological surface states on a Pd/SnTe metal/topological crystalline insulator heterostructure

Recently, topological quantum materials have emerged as a promising electrocatalyst for hydrogen evolution reaction (HER). However, most of their performance largely lags behind noble metals such as benchmark platinum (Pt). In this work, a Pd(20nm)/SnTe(70nm) heterostructure, fabricated by molecular beam epitaxy and electron beam evaporation, is found to display much higher electrocatalytic activity than that of a pure Pd(20nm) thin film and even higher than that of a commercial Pt foil. This heterostructure adopts an extracted turnover frequency value more than two times higher than that of the Pd(20nm) thin film at a potential of 0.2 V, indicating a much higher intrinsic activity per Pd site. Density functional theory calculations show that the conventional d-band theory, which works well for many transition metal heterostructures, cannot explain the enhancement of electrocatalytic performance. Instead, we found that the topological surface states (TSSs) of the SnTe (001) underlayer play a key role; electrons transfer from both the Pd surface and the adsorbed H atoms to the TSSs of SnTe (001), resulting in weaker Pd-H binding strength and more favorable hydrogen adsorption free energies. Our work demonstrates for the first time that a metal/topological quantum material heterostructure could be a prominent catalyst to enjoy HER activity outperforming that of a commercial Pt foil and offers a promising direction to optimize the performance of electrocatalysts based on topological quantum materials.

preprint2022arXiv

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. This phenomenon is called Neural Collapse (NC), which seems to take place regardless of the choice of loss functions. In this work, we justify NC under the mean squared error (MSE) loss, where recent empirical evidence shows that it performs comparably or even better than the de-facto cross-entropy loss. Under a simplified unconstrained feature model, we provide the first global landscape analysis for vanilla nonconvex MSE loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. Furthermore, we justify the usage of rescaled MSE loss by probing the optimization landscape around the NC solutions, showing that the landscape can be improved by tuning the rescaling hyperparameters. Finally, our theoretical findings are experimentally verified on practical network architectures.

preprint2022arXiv

Robust Training under Label Noise by Over-parameterization

Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Specifically, we model the label noise via another sparse over-parameterization term, and exploit implicit algorithmic regularizations to recover and separate the underlying corruptions. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets. Furthermore, our experimental results are corroborated by theory on simplified linear models, showing that exact separation between sparse noise and low-rank data can be achieved under incoherent conditions. The work opens many interesting directions for improving over-parameterized models by using sparse over-parameterization and implicit regularization.

preprint2020arXiv

A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution

We study the multi-channel sparse blind deconvolution (MCS-BD) problem, whose task is to simultaneously recover a kernel $\mathbf a$ and multiple sparse inputs $\{\mathbf x_i\}_{i=1}^p$ from their circulant convolution $\mathbf y_i = \mathbf a \circledast \mathbf x_i $ ($i=1,\cdots,p$). We formulate the task as a nonconvex optimization problem over the sphere. Under mild statistical assumptions of the data, we prove that the vanilla Riemannian gradient descent (RGD) method, with random initializations, provably recovers both the kernel $\mathbf a$ and the signals $\{\mathbf x_i\}_{i=1}^p$ up to a signed shift ambiguity. In comparison with state-of-the-art results, our work shows significant improvements in terms of sample complexity and computational efficiency. Our theoretical results are corroborated by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods on both synthetic and real datasets.

preprint2020arXiv

Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications

The problem of finding the sparsest vector (direction) in a low dimensional subspace can be considered as a homogeneous variant of the sparse recovery problem, which finds applications in robust subspace recovery, dictionary learning, sparse blind deconvolution, and many other problems in signal processing and machine learning. However, in contrast to the classical sparse recovery problem, the most natural formulation for finding the sparsest vector in a subspace is usually nonconvex. In this paper, we overview recent advances on global nonconvex optimization theory for solving this problem, ranging from geometric analysis of its optimization landscapes, to efficient optimization algorithms for solving the associated nonconvex optimization problem, to applications in machine intelligence, representation learning, and imaging sciences. Finally, we conclude this review by pointing out several interesting open problems for future research.

preprint2020arXiv

Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization

Recent advances have shown that implicit bias of gradient descent on over-parameterized models enables the recovery of low-rank matrices from linear measurements, even with no prior knowledge on the intrinsic rank. In contrast, for robust low-rank matrix recovery from grossly corrupted measurements, over-parameterization leads to overfitting without prior knowledge on both the intrinsic rank and sparsity of corruption. This paper shows that with a double over-parameterization for both the low-rank matrix and sparse corruption, gradient descent with discrepant learning rates provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption. We further extend our approach for the robust recovery of natural images by over-parameterizing images with deep convolutional networks. Experiments show that our method handles different test images and varying corruption levels with a single learning pipeline where the network width and termination conditions do not need to be adjusted on a case-by-case basis. Underlying the success is again the implicit bias with discrepant learning rates on different over-parameterized parameters, which may bear on broader applications.