Source author record

Qiang Ye

Qiang Ye appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA Numerical Analysis Artificial Intelligence Computer Vision cond-mat.str-el cond-mat.supr-con Networking and Internet Architecture

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

AUTM Flow: Atomic Unrestricted Time Machine for Monotonic Normalizing Flows

Nonlinear monotone transformations are used extensively in normalizing flows to construct invertible triangular mappings from simple distributions to complex ones. In existing literature, monotonicity is usually enforced by restricting function classes or model parameters and the inverse transformation is often approximated by root-finding algorithms as a closed-form inverse is unavailable. In this paper, we introduce a new integral-based approach termed "Atomic Unrestricted Time Machine (AUTM)", equipped with unrestricted integrands and easy-to-compute explicit inverse. AUTM offers a versatile and efficient way to the design of normalizing flows with explicit inverse and unrestricted function classes or parameters. Theoretically, we present a constructive proof that AUTM is universal: all monotonic normalizing flows can be viewed as limits of AUTM flows. We provide a concrete example to show how to approximate any given monotonic normalizing flow using AUTM flows with guaranteed convergence. The result implies that AUTM can be used to transform an existing flow into a new one equipped with explicit inverse and unrestricted parameters. The performance of the new approach is evaluated on high dimensional density estimation, variational inference and image generation. Experiments demonstrate superior speed and memory efficiency of AUTM.

preprint2022arXiv

Batch Normalization Preconditioning for Neural Network Training

Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this paper, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss function and hence convergence during training. One benefit is that BNP is not constrained on the mini-batch size and works in the online learning setting. Furthermore, its connection to BN provides theoretical insights on how BN improves training and how BN is applied to special architectures such as convolutional neural networks. For a theoretical foundation, we also present a novel Hessian condition number based convergence theory for a locally convex but not strong-convex loss, which is applicable to networks with a scale-invariant property.

preprint2022arXiv

Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation

In recent years, using orthogonal matrices has been shown to be a promising approach in improving Recurrent Neural Networks (RNNs) with training, stability, and convergence, particularly, to control gradients. While Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the usage of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and we propose a Neumann series-based Scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley Orthogonal GRU, or simply NC-GRU. We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU as well as several other RNNs.

preprint2022arXiv

Symmetry Structured Convolutional Neural Networks

We consider Convolutional Neural Networks (CNNs) with 2D structured features that are symmetric in the spatial dimensions. Such networks arise in modeling pairwise relationships for a sequential recommendation problem, as well as secondary structure inference problems of RNA and protein sequences. We develop a CNN architecture that generates and preserves the symmetry structure in the network's convolutional layers. We present parameterizations for the convolutional kernels that produce update rules to maintain symmetry throughout the training. We apply this architecture to the sequential recommendation problem, the RNA secondary structure inference problem, and the protein contact map prediction problem, showing that the symmetric structured networks produce improved results using fewer numbers of machine parameters.

preprint2021arXiv

Symmetry of magnetic correlations in spin-triplet superconductor UTe2

The temperature dependence of the low-energy magnetic excitations in the spin-triplet superconductor UTe$_2$ was measured via inelastic neutron scattering in the normal and superconducting states. The imaginary part of the dynamic susceptibility follows the behavior of interband correlations in a hybridized Kondo lattice with an appropriate characteristic energy. These excitations are a lower-dimensional analogue of phenomena observed in other Kondo lattice materials, such that their presence is not necessarily due to dominance of ferromagnetic or antiferromagnetic correlations. The onset of superconductivity alters the magnetic excitations noticeably on the same energy scales, suggesting that these changes originate from additional electronic structure modification.

preprint2020arXiv

A Virtual Network Customization Framework for Multicast Services in NFV-enabled Core Networks

The paradigm of network function virtualization (NFV) with the support of software defined networking (SDN) emerges as a promising approach for customizing network services in fifth generation (5G) networks. In this paper, a multicast service orchestration framework is presented, where joint traffic routing and virtual network function (NF) placement are studied for accommodating multicast services over an NFV-enabled physical substrate network. First, we investigate a joint routing and NF placement problem for a single multicast request accommodated over a physical substrate network, with both single-path and multipath traffic routing. The joint problem is formulated as a mixed integer linear programming (MILP) problem to minimize the function and link provisioning costs, under the physical network resource constraints, flow conservation constraints, and NF placement rules; Second, we develop an MILP formulation that jointly handles the static embedding of multiple service requests over the physical substrate network, where we determine the optimal combination of multiple services for embedding and their joint routing and placement configurations, such that the aggregate throughput of the physical substrate is maximized, while the function and link provisioning costs are minimized. Since the presented problem formulations are NP-hard, low complexity heuristic algorithms are proposed to find an efficient solution for both single-path and multipath routing scenarios. Simulation results are presented to demonstrate the effectiveness and accuracy of the proposed heuristic algorithms.

preprint2016arXiv

Error Bounds for the Krylov Subspace Methods for Computations of Matrix Exponentials

In this paper, we present new a posteriori and a priori error bounds for the Krylov subspace methods for computing $e^{-τA}v$ for a given $τ>0$ and $v \in C^n$, where $A$ is a large sparse non-Hermitian matrix. The {\em a priori} error bounds relate the convergence to $λ_{\min}\left(\frac{A+A^*}{2}\right)$, $λ_{\max}\left(\frac{A+A^*}{2}\right)$ (the smallest and the largest eigenvalue of the Hermitian part of $A$) and $|λ_{\max}\left(\frac{A-A^*}{2}\right)|$ (the largest eigenvalue in absolute value of the skew-Hermitian part of $A$), which define a rectangular region enclosing the field of values of $A$. In particular, our bounds explain an observed superlinear convergence behavior where the error may first stagnate for certain iterations before it starts to converge. The special case that $A$ is skew-Hermitian is also considered. Numerical examples are given to demonstrate the theoretical bounds.

Qiang Ye

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

AUTM Flow: Atomic Unrestricted Time Machine for Monotonic Normalizing Flows

Batch Normalization Preconditioning for Neural Network Training

Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation

Symmetry Structured Convolutional Neural Networks

Symmetry of magnetic correlations in spin-triplet superconductor UTe2

A Virtual Network Customization Framework for Multicast Services in NFV-enabled Core Networks

Error Bounds for the Krylov Subspace Methods for Computations of Matrix Exponentials