Source author record

Weitao Du

Weitao Du appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision math.PR

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Flexible Diffusion Model

Diffusion (score-based) generative models have been widely used for modeling various types of complex data, including images, audios, and point clouds. Recently, the deep connection between forward-backward stochastic differential equations (SDEs) and diffusion-based models has been revealed, and several new variants of SDEs are proposed (e.g., sub-VP, critically-damped Langevin) along this line. Despite the empirical success of the hand-crafted fixed forward SDEs, a great quantity of proper forward SDEs remain unexplored. In this work, we propose a general framework for parameterizing the diffusion model, especially the spatial part of the forward SDE. An abstract formalism is introduced with theoretical guarantees, and its connection with previous diffusion models is leveraged. We demonstrate the theoretical advantage of our method from an optimization perspective. Numerical experiments on synthetic datasets, MINIST and CIFAR10 are also presented to validate the effectiveness of our framework.

preprint2022arXiv

Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective

Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. Nevertheless, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: Given a sufficiently expressive space of models, can we successfully find a good solution via gradient descent-based optimizers? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-based techniques, which are found to be merely able to mitigate the exponential decay of trainability mildly. Inspired by our theoretical insights on trainability, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, to alleviate the exponential decay problem more fundamentally. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.

preprint2020arXiv

Constructing exchangeable pairs by diffusion on manifolds and its application

We construct a continuous family of exchangeable pairs by perturbing the random variable through diffusion processes on manifold in order to apply Stein method to certain geometric settings. We compare our perturbation by diffusion method with other approaches of building exchangeable pairs and show that our perturbation scheme cooperates with the infinitesimal version of Stein's method harmoniously. More precisely, our exchangeable pairs satisfy a key condition in the infinitesimal Stein's method in general. Based on the exchangeable pairs, we are able to extend the approximate normality of eigenfunctions of Laplacian on compact manifold to eigenfunctions of Witten Laplacian, which is of the form:$Δ_w = Δ- \nabla H$. We then apply our abstract theorem to recover a central limit result of linear statistics on sphere. Finally, we prove an an infinitesimal version of Stein's method for exponential distribution and combine it with our continuous family of exchangeable pairs to extend an approximate exponentiality result of $|Tr U|^2$, where $Tr U$ is the trace of the first power of a matrix $U$ sampled from the Haar measure of unitary group, to arbitrary power and its analog for general circular ensemble.

preprint2020arXiv

Mean field theory for deep dropout networks: digging up gradient backpropagation deeply

In recent years, the mean field theory has been applied to the study of neural networks and has achieved a great deal of success. The theory has been applied to various neural network structures, including CNNs, RNNs, Residual networks, and Batch normalization. Inevitably, recent work has also covered the use of dropout. The mean field theory shows that the existence of depth scales that limit the maximum depth of signal propagation and gradient backpropagation. However, the gradient backpropagation is derived under the gradient independence assumption that weights used during feed forward are drawn independently from the ones used in backpropagation. This is not how neural networks are trained in a real setting. Instead, the same weights used in a feed-forward step needs to be carried over to its corresponding backpropagation. Using this realistic condition, we perform theoretical computation on linear dropout networks and a series of experiments on dropout networks. Our empirical results show an interesting phenomenon that the length gradients can backpropagate for a single input and a pair of inputs are governed by the same depth scale. Besides, we study the relationship between variance and mean of statistical metrics of the gradient and shown an emergence of universality. Finally, we investigate the maximum trainable length for deep dropout networks through a series of experiments using MNIST and CIFAR10 and provide a more precise empirical formula that describes the trainable length than original work.

Weitao Du

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

A Flexible Diffusion Model

Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective

Constructing exchangeable pairs by diffusion on manifolds and its application

Mean field theory for deep dropout networks: digging up gradient backpropagation deeply