Source author record

Weiguo Gao

Weiguo Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Machine Learning Numerical Analysis Artificial Intelligence math.OC Computation Computation and Language Computer Vision Information Theory math.IT physics.comp-ph

Catalog footprint

What is connected

8works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Terminally constrained flow-based generative models from an optimal control perspective

We address the problem of sampling from terminally constrained distributions with pre-trained flow-based generative models through an optimal control formulation. Theoretically, we characterize the value function by a Hamilton-Jacobi-Bellman equation and derive the optimal feedback control as the minimizer of the associated Hamiltonian. We show that as the control penalty increases, the controlled process recovers the reference distribution, while as the penalty vanishes, the terminal law converges to a generalized Wasserstein projection onto the constraint manifold. Algorithmically, we introduce Terminal Optimal Control with Flow-based models (TOCFlow), a geometry-aware sampling-time guidance method for pre-trained flows. Solving the control problem in a terminal co-moving frame that tracks reference trajectories yields a closed-form scalar damping factor along the Riemannian gradient, capturing second-order curvature effects without matrix inversions. TOCFlow therefore matches the geometric consistency of Gauss-Newton updates at the computational cost of standard gradient guidance. We evaluate TOCFlow on three high-dimensional scientific tasks spanning equality, inequality, and global statistical constraints, namely Darcy flow, constrained trajectory planning, and turbulence snapshot generation with Kolmogorov spectral scaling. Across all settings, TOCFlow improves constraint satisfaction over Euclidean guidance and projection baselines while preserving the reference model's generative quality.

preprint2022arXiv

A More Stable Accelerated Gradient Method Inspired by Continuous-Time Perspective

Nesterov's accelerated gradient method (NAG) is widely used in problems with machine learning background including deep learning, and is corresponding to a continuous-time differential equation. From this connection, the property of the differential equation and its numerical approximation can be investigated to improve the accelerated gradient method. In this work we present a new improvement of NAG in terms of stability inspired by numerical analysis. We give the precise order of NAG as a numerical approximation of its continuous-time limit and then present a new method with higher order. We show theoretically that our new method is more stable than NAG for large step size. Experiments of matrix completion and handwriting digit recognition demonstrate that the stability of our new method is better. Furthermore, better stability leads to higher computational speed in experiments.

preprint2022arXiv

Blind super-resolution of point sources via fast iterative hard thresholding

In this work, we develop a provable fast algorithm for blind super-resolution based on the low rank structure of vectorized Hankel matrix associated with the target matrix. Theoretical results show that the proposed method converges to the ground truth with linear convergence rate. Numerical experiments are also conducted to illustrate the linear convergence and effectiveness of the proposed approach.

preprint2022arXiv

Global Convergence of Triangularized Orthogonalization-free Method

This paper proves the global convergence of a triangularized orthogonalization-free method (TriOFM). TriOFM, in general, applies a triangularization idea to the gradient of an objective function and removes the rotation invariance in minimizers. More precisely, in this paper, the TriOFM works as an eigensolver for sizeable sparse matrices and obtains eigenvectors without any orthogonalization step. Due to the triangularization, the iteration is a discrete-time flow in a non-conservative vector field. The global convergence relies on the stable manifold theorem, whereas the convergence to stationary points is proved in detail in this paper. We provide two proofs inspired by the noisy power method and the noisy optimization method, respectively.

preprint2022arXiv

SOFT: Softmax-free Transformer with Linear Complexity

Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. However, the employment of self-attention modules results in a quadratic complexity in both computation and memory usage. Various attempts on approximating the self-attention computation with linear complexity have been made in Natural Language Processing. However, an in-depth analysis in this work shows that they are either theoretically flawed or empirically ineffective for visual recognition. We further identify that their limitations are rooted in keeping the softmax self-attention during approximations. Specifically, conventional self-attention is computed by normalizing the scaled dot-product between token feature vectors. Keeping this softmax operation challenges any subsequent linearization efforts. Based on this insight, for the first time, a softmax-free transformer or SOFT is proposed. To remove softmax in self-attention, Gaussian kernel function is used to replace the dot-product similarity without further normalization. This enables a full self-attention matrix to be approximated via a low-rank matrix decomposition. The robustness of the approximation is achieved by calculating its Moore-Penrose inverse using a Newton-Raphson method. Extensive experiments on ImageNet show that our SOFT significantly improves the computational efficiency of existing ViT variants. Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.

preprint2020arXiv

Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion

Task 1 of the DSTC8-track1 challenge aims to develop an end-to-end multi-domain dialogue system to accomplish complex users' goals under tourist information desk settings. This paper describes our submitted solution, Hierarchical Context Enhanced Dialogue System (HCEDS), for this task. The main motivation of our system is to comprehensively explore the potential of hierarchical context for sufficiently understanding complex dialogues. More specifically, we apply BERT to capture token-level information and employ the attention mechanism to capture sentence-level information. The results listed in the leaderboard show that our system achieves first place in automatic evaluation and the second place in human evaluation.

preprint2020arXiv

Solving the k-sparse Eigenvalue Problem with Reinforcement Learning

We examine the possibility of using a reinforcement learning (RL) algorithm to solve large-scale eigenvalue problems in which the desired the eigenvector can be approximated by a sparse vector with at most $k$ nonzero elements, where $k$ is relatively small compare to the dimension of the matrix to be partially diagonalized. This type of problem arises in applications in which the desired eigenvector exhibits localization properties and in large-scale eigenvalue computations in which the amount of computational resource is limited. When the positions of these nonzero elements can be determined, we can obtain the $k$-sparse approximation to the original problem by computing eigenvalues of a $k\times k$ submatrix extracted from $k$ rows and columns of the original matrix. We review a previously developed greedy algorithm for incrementally probing the positions of the nonzero elements in a $k$-sparse approximate eigenvector and show that the greedy algorithm can be improved by using an RL method to refine the selection of $k$ rows and columns of the original matrix. We describe how to represent states, actions, rewards and policies in an RL algorithm designed to solve the $k$-sparse eigenvalue problem and demonstrate the effectiveness of the RL algorithm on two examples originating from quantum many-body physics.

preprint2014arXiv

An Iterative Minimization Formulation for Saddle-Point Search

This paper proposes and analyzes an iterative minimization formulation for search- ing index-1 saddle points of an energy function. This formulation differs from other eigenvector-following methods by constructing a new objective function near the guess at each iteration step. This leads to a quadratic convergence rate, in comparison to the linear case of the gentlest ascent dynamics (E and Zhou, nonlinearity, vol 24, p1831, 2011) and many other existing methods. We also propose the generalization of the new methodology for saddle points of higher index and for constrained energy functions on manifold.

Weiguo Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Terminally constrained flow-based generative models from an optimal control perspective

A More Stable Accelerated Gradient Method Inspired by Continuous-Time Perspective

Blind super-resolution of point sources via fast iterative hard thresholding

Global Convergence of Triangularized Orthogonalization-free Method

SOFT: Softmax-free Transformer with Linear Complexity

Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion

Solving the k-sparse Eigenvalue Problem with Reinforcement Learning

An Iterative Minimization Formulation for Saddle-Point Search