Researcher profile

Wei Tao

Wei Tao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization

Recently, video language models (VLMs) have been applied in various fields. However, the visual token sequence of the VLM is too long, which may cause intolerant inference latency and GPU memory usage. Existing methods propose mixed-precision quantization to the key-value (KV) cache in VLMs based on token granularity, which is time-consuming in the search process and hardware inefficient during computation. This paper introduces a novel approach called WindowQuant, which employs window-adaptive mixed-precision quantization to optimize the KV cache. WindowQuant consists of two modules: window-level quantization search and window-level KV cache computation. Window-level quantization search quickly determines the optimal bit-width configuration of the KV cache windows based on the similarity scores between the corresponding visual token windows and the text prompt, maintaining the model accuracy. Furthermore, window-level KV cache computation reorders the KV cache windows before quantization, avoiding the hardware inefficiency caused by mixed-precision quantization in inference computation. Extensive experiments demonstrate that WindowQuant outperforms state-of-the-art VLM models and KV cache quantization methods on various datasets.

preprint2025arXiv

Magneto-optical Skyrmion for manipulation of arbitrary light polarization

Dynamic manipulation of arbitrary light polarization is of fundamental importance for versatile optical functionalities, yet realizing such full-Poincaré-sphere control within compact nanophotonic architectures remains a formidable challenge. Here, we theoretically propose and numerically demonstrate a magneto-optical skyrmion platform enabling full polarization control of cavity eigenmodes. We reveal the correspondence between the near-field wavefunctions of degenerate dipoles and far-field polarization. By applying multidirectional magnetic fields to magneto-optical photonic crystals, we achieve any complex superposition of orthogonal eigenmodes, thereby realizing arbitrary far-field polarization. This mapping manifests as a skyrmion with a topological charge of 2, guaranteeing coverage of the entire Poincaré sphere. Our theoretical model shows excellent agreement with full-wave simulations. Furthermore, we realize bound states in the continuum (BICs) with dynamically tunable polarization textures and demonstrate high-performance polarization-selective emission and transmission. This work establishes a topological paradigm for precise polarization shaping, offering new avenues for advanced optical communication and sensing.

preprint2022arXiv

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Momentum methods, including heavy-ball~(HB) and Nesterov's accelerated gradient~(NAG), are widely used in training neural networks for their fast convergence. However, there is a lack of theoretical guarantees for their convergence and acceleration since the optimization landscape of the neural network is non-convex. Nowadays, some works make progress towards understanding the convergence of momentum methods in an over-parameterized regime, where the number of the parameters exceeds that of the training instances. Nonetheless, current results mainly focus on the two-layer neural network, which are far from explaining the remarkable success of the momentum methods in training deep neural networks. Motivated by this, we investigate the convergence of NAG with constant learning rate and momentum parameter in training two architectures of deep linear networks: deep fully-connected linear neural networks and deep linear ResNets. Based on the over-parameterization regime, we first analyze the residual dynamics induced by the training trajectory of NAG for a deep fully-connected linear neural network under the random Gaussian initialization. Our results show that NAG can converge to the global minimum at a $(1 - \mathcal{O}(1/\sqrtκ))^t$ rate, where $t$ is the iteration number and $κ> 1$ is a constant depending on the condition number of the feature matrix. Compared to the $(1 - \mathcal{O}(1/κ))^t$ rate of GD, NAG achieves an acceleration over GD. To the best of our knowledge, this is the first theoretical guarantee for the convergence of NAG to the global minimum in training deep neural networks. Furthermore, we extend our analysis to deep linear ResNets and derive a similar convergence result.

preprint2022arXiv

A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances

Affective computing plays a key role in human-computer interactions, entertainment, teaching, safe driving, and multimedia integration. Major breakthroughs have been made recently in the areas of affective computing (i.e., emotion recognition and sentiment analysis). Affective computing is realized based on unimodal or multimodal data, primarily consisting of physical information (e.g., textual, audio, and visual data) and physiological signals (e.g., EEG and ECG signals). Physical-based affect recognition caters to more researchers due to multiple public databases. However, it is hard to reveal one's inner emotion hidden purposely from facial expressions, audio tones, body gestures, etc. Physiological signals can generate more precise and reliable emotional results; yet, the difficulty in acquiring physiological signals also hinders their practical application. Thus, the fusion of physical information and physiological signals can provide useful features of emotional states and lead to higher accuracy. Instead of focusing on one specific field of affective analysis, we systematically review recent advances in the affective computing, and taxonomize unimodal affect recognition as well as multimodal affective analysis. Firstly, we introduce two typical emotion models followed by commonly used databases for affective computing. Next, we survey and taxonomize state-of-the-art unimodal affect recognition and multimodal affective analysis in terms of their detailed architectures and performances. Finally, we discuss some important aspects on affective computing and their applications and conclude this review with an indication of the most promising future directions, such as the establishment of baseline dataset, fusion strategies for multimodal affective analysis, and unsupervised learning models.

preprint2022arXiv

Multi-Band Superconductivity in Strongly Hybridized 1T'-WTe$_2$/NbSe$_2$ Heterostructures

The interplay of topology and superconductivity has become a subject of intense research in condensed matter physics for the pursuit of topologically non-trivial forms of superconducting pairing. An intrinsically normal-conducting material can inherit superconductivity via electrical contact to a parent superconductor via the proximity effect, usually understood as Andreev reflection at the interface between the distinct electronic structures of two separate conductors. However, at high interface transparency, strong coupling inevitably leads to changes in the band structure, locally, owing to hybridization of electronic states. Here, we investigate such strongly proximity-coupled heterostructures of monolayer 1T'-WTe$_2$, grown on NbSe$_2$ by van-der-Waals epitaxy. The superconducting local density of states (LDOS), resolved in scanning tunneling spectroscopy down to 500~mK, reflects a hybrid electronic structure, well-described by a multi-band framework based on the McMillan equations which captures the multi-band superconductivity inherent to the NbSe$_2$ substrate and that induced by proximity in WTe$_2$, self-consistently. Our material-specific tight-binding model captures the hybridized heterostructure quantitatively, and confirms that strong inter-layer hopping gives rise to a semi-metallic density of states in the 2D WTe$_2$ bulk, even for nominally band-insulating crystals. The model further accurately predicts the measured order parameter $Δ\simeq 0.6$~meV induced in the WTe$_2$ monolayer bulk, stable beyond a 2~T magnetic field. We believe that our detailed multi-band analysis of the hybrid electronic structure provides a useful tool for sensitive spatial mapping of induced order parameters in proximitized atomically thin topological materials.

preprint2022arXiv

Next-to-leading order QCD calculation of $B_c$ to charmonium tensor form factors

We present a next-to-leading order (NLO) QCD corrections to $B_c\to η_c$ and $B_c\to J/ψ$ tensor form factors within nonrelativistic QCD (NRQCD) framework. The full analytical results for $B_c$ to S-wave charmonium tensor form factors are obtained. We also studied the asymptotic behaviours of tensor form factors in hierarchy heavy quark limit, i.e. $m_b\to\infty,~ m_c\to\infty, ~\mathrm{and }~m_c/m_b\to0$. A compact expression for tensor form factors are given analytically in the hierarchy heavy quark limit. The relation among different form factors is also analyzed especially at large momentum recoil point. The numerical results for the $B_c$ to charmonium tensor form factors in all the physical region are given in the end.

preprint2022arXiv

Next-to-next-to-leading order matching of beauty-charmed meson $B_{c}$ and $B^*_{c}$ decay constants

We present the next-to-next-to-leading order (NNLO) QCD corrections to the decay constants for both the pseudoscalar and vector beauty-charmed mesons $B_{c}$ and $B^*_{c}$ in nonrelativistic QCD effective theory. Explicit NNLO calculation verified that the $B_c$ decay constant from pseudoscalar current is identical with the $B_c$ decay constant from axial-vector current. The NNLO result for the vector decay constant of $B^*_{c}$ meson is novel. Combined with the latest extraction of nonrelativistic QCD long-distance matrix elements of $B_c$ meson, we give the branching ratios of leptonic decays of $B_{c}$ and $B^*_{c}$ mesons. In addition, the novel anomalous dimension for the flavor-changing heavy quark vector current in nonrelativistic QCD effective theory are helpful to investigate the threshold behaviours of two different heavy quarks.

preprint2022arXiv

Provable Convergence of Nesterov's Accelerated Gradient Method for Over-Parameterized Neural Networks

Momentum methods, such as heavy ball method~(HB) and Nesterov's accelerated gradient method~(NAG), have been widely used in training neural networks by incorporating the history of gradients into the current updating process. In practice, they often provide improved performance over (stochastic) gradient descent~(GD) with faster convergence. Despite these empirical successes, theoretical understandings of their accelerated convergence rates are still lacking. Recently, some attempts have been made by analyzing the trajectories of gradient-based methods in an over-parameterized regime, where the number of the parameters is significantly larger than the number of the training instances. However, the majority of existing theoretical work is mainly concerned with GD and the established convergence result of NAG is inferior to HB and GD, which fails to explain the practical success of NAG. In this paper, we take a step towards closing this gap by analyzing NAG in training a randomly initialized over-parameterized two-layer fully connected neural network with ReLU activation. Despite the fact that the objective function is non-convex and non-smooth, we show that NAG converges to a global minimum at a non-asymptotic linear rate $(1-Θ(1/\sqrtκ))^t$, where $κ> 1$ is the condition number of a gram matrix and $t$ is the number of the iterations. Compared to the convergence rate $(1-Θ(1/κ))^t$ of GD, our result provides theoretical guarantees for the acceleration of NAG in neural network training. Furthermore, our findings suggest that NAG and HB have similar convergence rate. Finally, we conduct extensive experiments on six benchmark datasets to validate the correctness of our theoretical results.

preprint2022arXiv

QSpeech: Low-Qubit Quantum Speech Application Toolkit

Quantum devices with low qubits are common in the Noisy Intermediate-Scale Quantum (NISQ) era. However, Quantum Neural Network (QNN) running on low-qubit quantum devices would be difficult since it is based on Variational Quantum Circuit (VQC), which requires many qubits. Therefore, it is critical to make QNN with VQC run on low-qubit quantum devices. In this study, we propose a novel VQC called the low-qubit VQC. VQC requires numerous qubits based on the input dimension; however, the low-qubit VQC with linear transformation can liberate this condition. Thus, it allows the QNN to run on low-qubit quantum devices for speech applications. Furthermore, as compared to the VQC, our proposed low-qubit VQC can stabilize the training process more. Based on the low-qubit VQC, we implement QSpeech, a library for quick prototyping of hybrid quantum-classical neural networks in the speech field. It has numerous quantum neural layers and QNN models for speech applications. Experiments on Speech Command Recognition and Text-to-Speech show that our proposed low-qubit VQC outperforms VQC and is more stable.

preprint2021arXiv

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient convergence analysis for strongly convex optimization. Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor. In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting. We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized. We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence. Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.

preprint2021arXiv

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

The adaptive stochastic gradient descent (SGD) with momentum has been widely adopted in deep learning as well as convex optimization. In practice, the last iterate is commonly used as the final solution to make decisions. However, the available regret analysis and the setting of constant momentum parameters only guarantee the optimal convergence of the averaged solution. In this paper, we fill this theory-practice gap by investigating the convergence of the last iterate (referred to as individual convergence), which is a more difficult task than convergence analysis of the averaged solution. Specifically, in the constrained convex cases, we prove that the adaptive Polyak's Heavy-ball (HB) method, in which only the step size is updated using the exponential moving average strategy, attains an optimal individual convergence rate of $O(\frac{1}{\sqrt{t}})$, as opposed to the optimality of $O(\frac{\log t}{\sqrt {t}})$ of SGD, where $t$ is the number of iterations. Our new analysis not only shows how the HB momentum and its time-varying weight help us to achieve the acceleration in convex optimization but also gives valuable hints how the momentum parameters should be scheduled in deep learning. Empirical results on optimizing convex functions and training deep networks validate the correctness of our convergence analysis and demonstrate the improved performance of the adaptive HB methods.

preprint2020arXiv

Calculation of Feynman loop integration and phase-space integration via auxiliary mass flow

We extend the auxiliary-mass-flow (AMF) method originally developed for Feynman loop integration to calculate integrals involving also phase-space integration. Flow of the auxiliary mass from the boundary ($\infty$) to the physical point ($0^+$) is obtained by numerically solving differential equations with respective to the auxiliary mass. For problems with two or more kinematical invariants, the AMF method can be combined with traditional differential equation method by providing systematical boundary conditions and highly nontrivial self-consistent check. The method is described in detail with a pedagogical example of $e^+e^-\rightarrow γ^* \rightarrow t\bar{t}+X$ at NNLO. We show that the AMF method can systematically and efficiently calculate integrals to high precision.

preprint2020arXiv

QuantNet: Learning to Quantize by Learning within Fully Differentiable Framework

Despite the achievements of recent binarization methods on reducing the performance degradation of Binary Neural Networks (BNNs), gradient mismatching caused by the Straight-Through-Estimator (STE) still dominates quantized networks. This paper proposes a meta-based quantizer named QuantNet, which utilizes a differentiable sub-network to directly binarize the full-precision weights without resorting to STE and any learnable gradient estimators. Our method not only solves the problem of gradient mismatching, but also reduces the impact of discretization errors, caused by the binarizing operation in the deployment, on performance. Generally, the proposed algorithm is implemented within a fully differentiable framework, and is easily extended to the general network quantization with any bits. The quantitative experiments on CIFAR-100 and ImageNet demonstrate that QuantNet achieves the signifficant improvements comparing with previous binarization methods, and even bridges gaps of accuracies between binarized models and full-precision models.