Source author record

Fuchang Gao

Fuchang Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Machine Learning math.ST Statistics Theory Artificial Intelligence Neural and Evolutionary Computing

Catalog footprint

What is connected

10works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

$Ae^2I$: A Double Autoencoder for Imputation of Missing Values

The most common strategy of imputing missing values in a table is to study either the column-column relationship or the row-row relationship of the data table, then use the relationship to impute the missing values based on the non-missing values from other columns of the same row, or from the other rows of the same column. This paper introduces a double autoencoder for imputation ($Ae^2I$) that simultaneously and collaboratively uses both row-row relationship and column-column relationship to impute the missing values. Empirical tests on Movielens 1M dataset demonstrated that $Ae^2I$ outperforms the current state-of-the-art models for recommender systems by a significant margin.

preprint2023arXiv

Data-aware customization of activation functions reduces neural network error

Activation functions play critical roles in neural networks, yet current off-the-shelf neural networks pay little attention to the specific choice of activation functions used. Here we show that data-aware customization of activation functions can result in striking reductions in neural network error. We first give a simple linear algebraic explanation of the role of activation functions in neural networks; then, through connection with the Diaconis-Shahshahani Approximation Theorem, we propose a set of criteria for good activation functions. As a case study, we consider regression tasks with a partially exchangeable target function, \emph{i.e.} $f(u,v,w)=f(v,u,w)$ for $u,v\in \mathbb{R}^d$ and $w\in \mathbb{R}^k$, and prove that for such a target function, using an even activation function in at least one of the layers guarantees that the prediction preserves partial exchangeability for best performance. Since even activation functions are seldom used in practice, we designed the ``seagull'' even activation function $\log(1+x^2)$ according to our criteria. Empirical testing on over two dozen 9-25 dimensional examples with different local smoothness, curvature, and degree of exchangeability revealed that a simple substitution with the ``seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error. This improvement was most pronounced when the activation function substitution was applied to the layer in which the exchangeable variables are connected for the first time. While the improvement is greatest for low-dimensional data, experiments on the CIFAR10 image classification dataset showed that use of ``seagull'' can reduce error even for high-dimensional cases. These results collectively highlight the potential of customizing activation functions as a general approach to improve neural network performance.

preprint2020arXiv

A Use of Even Activation Functions in Neural Networks

Despite broad interest in applying deep learning techniques to scientific discovery, learning interpretable formulas that accurately describe scientific data is very challenging because of the vast landscape of possible functions and the "black box" nature of deep neural networks. The key to success is to effectively integrate existing knowledge or hypotheses about the underlying structure of the data into the architecture of deep learning models to guide machine learning. Currently, such integration is commonly done through customization of the loss functions. Here we propose an alternative approach to integrate existing knowledge or hypotheses of data structure by constructing custom activation functions that reflect this structure. Specifically, we study a common case when the multivariate target function $f$ to be learned from the data is partially exchangeable, \emph{i.e.} $f(u,v,w)=f(v,u,w)$ for $u,v\in \mathbb{R}^d$. For instance, these conditions are satisfied for the classification of images that is invariant under left-right flipping. Through theoretical proof and experimental verification, we show that using an even activation function in one of the fully connected layers improves neural network performance. In our experimental 9-dimensional regression problems, replacing one of the non-symmetric activation functions with the designated "Seagull" activation function $\log(1+x^2)$ results in substantial improvement in network performance. Surprisingly, even activation functions are seldom used in neural networks. Our results suggest that customized activation functions have great potential in neural networks.

preprint2014arXiv

Upper tail probabilities of integrated Brownian motions

We obtain new upper tail probabilities of $m$-times integrated Brownian motions under the uniform norm and the $L^p$ norm. For the uniform norm, Talagrand's approach is used, while for the $L^p$ norm, Zolotare's approach together with suitable metric entropy and the associated small ball probabilities are used. This proposed method leads to an interesting and concrete connection between small ball probabilities and upper tail probabilities (large ball probabilities) for general Gaussian random variable in Banach spaces. As applications, explicit bounds are given for the largest eigenvalue of the covariance operator, and appropriate limiting behaviors of the Laplace transforms of $m$-times integrated Brownian motions are presented as well.

preprint2013arXiv

Comparison for upper tail probabilities of random series

Let $\{ξ_n\}$ be a sequence of independent and identically distributed random variables. In this paper we study the comparison for two upper tail probabilities $\mathbb{P}\{\sum_{n=1}^{\infty}a_n|ξ_n|^p\geq r\}$ and $\mathbb{P}\{\sum_{n=1}^{\infty}b_n|ξ_n|^p\geq r\}$ as $r\rightarrow\infty$ with two different real series $\{a_n\}$ and $\{b_n\}.$ The first result is for Gaussian random variables $\{ξ_n\},$ and in this case these two probabilities are equivalent after suitable scaling. The second result is for more general random variables, thus a weaker form of equivalence (namely, logarithmic level) is proved.

preprint2012arXiv

Global Rates of Convergence of the MLE for Multivariate Interval Censoring

We establish global rates of convergence of the Maximum Likelihood Estimator (MLE) of a multivariate distribution function in the case of (one type of) "interval censored" data. The main finding is that the rate of convergence of the MLE in the Hellinger metric is no worse than $n^{-1/3} (\log n)^γ$ for $γ= (5d - 4)/6$.

preprint2012arXiv

Persistence of iterated partial sums

Let $S_n^{(2)}$ denote the iterated partial sums. That is, $S_n^{(2)}=S_1+S_2+ ... +S_n$, where $S_i=X_1+X_2+ ... s+X_i$. Assuming $X_1, X_2,....,X_n$ are integrable, zero-mean, i.i.d. random variables, we show that the persistence probabilities $$p_n^{(2)}:=\PP(\max_{1\le i \le n}S_i^{(2)}< 0) \le c\sqrt{\frac{\EE|S_{n+1}|}{(n+1)\EE|X_1|}},$$ with $c \le 6 \sqrt{30}$ (and $c=2$ whenever $X_1$ is symmetric). The converse inequality holds whenever the non-zero $\min(-X_1,0)$ is bounded or when it has only finite third moment and in addition $X_1$ is squared integrable. Furthermore, $p_n^{(2)}\asymp n^{-1/4}$ for any non-degenerate squared integrable, i.i.d., zero-mean $X_i$. In contrast, we show that for any $0 < γ< 1/4$ there exist integrable, zero-mean random variables for which the rate of decay of $p_n^{(2)}$ is $n^{-γ}$.

preprint2011arXiv

Persistence of iterated partial sums

Let p_n denote the persistence probability that the first n iterated partial sums of integrable, zero-mean, i.i.d. random variables X_k, are negative. We show that p_n is bounded above up to universal constant by the square root of the expected absolute value of the empirical average of {X_k}. A converse bound holds whenever P(-X_1>t) is up to constant exp(-b t) for some b>0 or when P(-X_1>t) decays super-exponentially in t. Consequently, for such random variables we have that p_n decays as n^{-1/4} if X_1 has finite second moment. In contrast, we show that for any 0 < c < 1/4 there exist integrable, zero-mean random variables for which the rate of decay of p_n is n^{-c}.

preprint2011arXiv

Small deviations for a family of smooth Gaussian processes

We study the small deviation probabilities of a family of very smooth self-similar Gaussian processes. The canonical process from the family has the same scaling property as standard Brownian motion and plays an important role in the study of zeros of random polynomials. Our estimates are based on the entropy method, discovered in Kuelbs and Li (1992) and developed further in Li and Linde (1999), Gao (2004), and Aurzada et al. (2009). While there are several ways to obtain the result w.r.t. the $L_2$ norm, the main contribution of this paper concerns the result w.r.t. the supremum norm. In this connection, we develop a tool that allows to translate upper estimates for the entropy of an operator mapping into $L_2[0,1]$ by those of the operator mapping into $C[0,1]$, if the image of the operator is in fact a Hölder space. The results are further applied to the entropy of function classes, generalizing results of Gao et al. (2010).

preprint2010arXiv

How many Laplace transforms of probability measures are there?

A bracketing metric entropy bound for the class of Laplace transforms of probability measures on [0,\infty) is obtained through its connection with the small deviation probability of a smooth Gaussian process. Our results for the particular smooth Gaussian process seem to be of independent interest.

Fuchang Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

$Ae^2I$: A Double Autoencoder for Imputation of Missing Values

Data-aware customization of activation functions reduces neural network error

A Use of Even Activation Functions in Neural Networks

Upper tail probabilities of integrated Brownian motions

Comparison for upper tail probabilities of random series

Global Rates of Convergence of the MLE for Multivariate Interval Censoring

Persistence of iterated partial sums

Persistence of iterated partial sums

Small deviations for a family of smooth Gaussian processes

How many Laplace transforms of probability measures are there?