Researcher profile

Zheng Qu

Zheng Qu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Dynamic N:M Fine-grained Structured Sparse Attention Mechanism

Transformers are becoming the mainstream solutions for various tasks like NLP and Computer vision. Despite their success, the high complexity of the attention mechanism hinders them from being applied to latency-sensitive tasks. Tremendous efforts have been made to alleviate this problem, and many of them successfully reduce the asymptotic complexity to linear. Nevertheless, most of them fail to achieve practical speedup over the original full attention under moderate sequence lengths and are unfriendly to finetuning. In this paper, we present DFSS, an attention mechanism that dynamically prunes the full attention weight matrix to N:M fine-grained structured sparse pattern. We provide both theoretical and empirical evidence that demonstrates DFSS is a good approximation of the full attention mechanism. We propose a dedicated CUDA kernel design that completely eliminates the dynamic pruning overhead and achieves speedups under arbitrary sequence length. We evaluate the 1:2 and 2:4 sparsity under different configurations and achieve 1.27~ 1.89x speedups over the full-attention mechanism. It only takes a couple of finetuning epochs from the pretrained model to achieve on par accuracy with full attention mechanism on tasks from various domains under different sequence lengths from 384 to 4096.

preprint2021arXiv

An adaptive proximal point algorithm framework and application to large-scale optimization

We investigate the proximal point algorithm (PPA) and its inexact extensions under an error bound condition, which guarantees a global linear convergence if the proximal regularization parameter is larger than the error bound condition parameter. We propose an adaptive generalized proximal point algorithm (AGPPA), which adaptively updates the proximal regularization parameters based on some implementable criteria. We show that AGPPA achieves linear convergence without any knowledge of the error bound condition parameter, and the rate only differs from the optimal one by a logarithm term. We apply AGPPA on convex minimization problem and analyze the iteration complexity bound of the resulting algorithm. Our framework and the complexity results apply to arbitrary linearly convergent inner solver and allows a hybrid with any locally fast convergent method. We illustrate the performance of AGPPA by applying it to solve large-scale linear programming (LP) problem. The resulting complexity bound has a weaker dependence on the Hoffman constant and scales with the dimension better than linearized ADMM. In numerical experiments, our algorithm demonstrates improved performance in obtaining solution of medium accuracy on large-scale LP problem.