Source author record

Yitong Sun

Yitong Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Out-of-Distribution Detection with Class Ratio Estimation

Density-based Out-of-distribution (OOD) detection has recently been shown unreliable for the task of detecting OOD images. Various density ratio based approaches achieve good empirical performance, however methods typically lack a principled probabilistic modelling explanation. In this work, we propose to unify density ratio based methods under a novel framework that builds energy-based models and employs differing base distributions. Under our framework, the density ratio can be viewed as the unnormalized density of an implicit semantic distribution. Further, we propose to directly estimate the density ratio of a data sample through class ratio estimation. We report competitive results on OOD image problems in comparison with recent work that alternatively requires training of deep generative models for the task. Our approach enables a simple and yet effective path towards solving the OOD detection problem.

preprint2021arXiv

A Practical Layer-Parallel Training Algorithm for Residual Networks

Gradient-based algorithms for training ResNets typically require a forward pass of the input data, followed by back-propagating the objective gradient to update parameters, which are time-consuming for deep ResNets. To break the dependencies between modules in both the forward and backward modes, auxiliary-variable methods such as the penalty and augmented Lagrangian (AL) approaches have attracted much interest lately due to their ability to exploit layer-wise parallelism. However, we observe that large communication overhead and lacking data augmentation are two key challenges of these methods, which may lead to low speedup ratio and accuracy drop across multiple compute devices. Inspired by the optimal control formulation of ResNets, we propose a novel serial-parallel hybrid training strategy to enable the use of data augmentation, together with downsampling filters to reduce the communication cost. The proposed strategy first trains the network parameters by solving a succession of independent sub-problems in parallel and then corrects the network parameters through a full serial forward-backward propagation of data. Such a strategy can be applied to most of the existing layer-parallel training methods using auxiliary variables. As an example, we validate the proposed strategy using penalty and AL methods on ResNet and WideResNet across MNIST, CIFAR-10 and CIFAR-100 datasets, achieving significant speedup over the traditional layer-serial training methods while maintaining comparable accuracy.

preprint2020arXiv

Towards Better Understanding of Disentangled Representations via Mutual Information

Most existing works on disentangled representation learning are solely built upon an marginal independence assumption: all factors in disentangled representations should be statistically independent. This assumption is necessary but definitely not sufficient for the disentangled representations without additional inductive biases in the modeling process, which is shown theoretically in recent studies. We argue in this work that disentangled representations should be characterized by their relation with observable data. In particular, we formulate such a relation through the concept of mutual information: the mutual information between each factor of the disentangled representations and data should be invariant conditioned on values of the other factors. Together with the widely accepted independence assumption, we further bridge it with the conditional independence of factors in representations conditioned on data. Moreover, we note that conditional independence of latent variables has been imposed on most VAE-type models and InfoGAN due to the artificial choice of factorized approximate posterior $q(\rvz|\rvx)$ in the encoders. Such an arrangement of encoders introduces a crucial inductive bias for disentangled representations. To demonstrate the importance of our proposed assumption and the related inductive bias, we show in experiments that violating the assumption leads to decline of disentanglement among factors in the learned representations.