Source author record

Lizhen Lin

Lizhen Lin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology Applications Computation Artificial Intelligence Distributed, Parallel, and Cluster Computing eess.SY math.NA math.OC Numerical Analysis stat.OT Systems and Control

Catalog footprint

What is connected

19works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces

We study posterior contraction rates for sparse Bayesian Kolmogorov-Arnold networks (KANs) over anisotropic Besov spaces, providing a statistical foundation of KANs from a Bayesian point of view. We show that sparse Bayesian KANs equipped with spike-and-slab-type sparsity priors attain the near-minimax posterior contraction. In particular, the contraction rate depends on the intrinsic anisotropic smoothness of the underlying function. Moreover, by placing a hyperprior on a single model-size parameter, the resulting posterior adapts to unknown anisotropic smoothness and still achieves the corresponding near-minimax rate. A distinctive feature of our results, compared with those for standard sparse MLP-based models, is that the KAN depth can be kept fixed: owing to the flexibility of learnable spline edge functions, the required approximation complexity is controlled through the network width, spline-grid range and size, and parameter sparsity. Our analysis develops theoretical tools tailored to sparse spline-edge architectures, including approximation and complexity bounds for Bayesian KANs. We then extend to compositional Besov spaces and show that the contraction rates depend on layerwise smoothness and effective dimension of the underlying compositional structure, thereby effectively avoiding the curse of dimensionality. Together, the developed tools and findings advance the theoretical understanding of Bayesian neural networks and provide rigorous statistical foundations for KANs.

preprint2022arXiv

Extrinsic Bayesian Optimizations on Manifolds

We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and quantify the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Gaussian process, which guides the search in the optimization process. The critical challenge for designing Bayesian optimization algorithms on manifolds lies in the difficulty of constructing valid covariance kernels for Gaussian processes on general manifolds. Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensional Euclidean space via equivariant embeddings and then constructing a valid covariance kernel on the image manifold after the embedding. This leads to efficient and scalable algorithms for optimization over complex manifolds. Simulation study and real data analysis are carried out to demonstrate the utilities of our eBO framework by applying the eBO to various optimization problems over manifolds such as the sphere, the Grassmannian, and the manifold of positive definite matrices.

preprint2022arXiv

Multilevel Network Item Response Modeling for Discovering Differences Between Innovation and Regular School Systems in Korea

The innovation school system in South Korea has been developed in response to the traditional high-pressure school system in South Korea, with a view to cultivating a bottom-up and student-centered educational culture. Despite its ambitious goals, questions have been raised about the success of the innovation school system. Leveraging data from the Gyeonggi Education Panel Study (GEPS) along with advances in the statistical analysis of network data and educational data, we compare the two school systems in more depth. We find that some schools are indeed different from others, and those differences are not detected by conventional multilevel models. Having said that, we do not find much evidence that the innovation school system differs from the regular school system in terms of self-reported mental well-being, although we do detect differences among some schools that appear to be unrelated to the school system.

preprint2022arXiv

Neural-PDE: A RNN based neural network for solving time dependent PDEs

Partial differential equations (PDEs) play a crucial role in studying a vast number of problems in science and engineering. Numerically solving nonlinear and/or high-dimensional PDEs is often a challenging task. Inspired by the traditional finite difference and finite elements methods and emerging advancements in machine learning, we propose a sequence deep learning framework called Neural-PDE, which allows to automatically learn governing rules of any time-dependent PDE system from existing data by using a bidirectional LSTM encoder, and predict the next n time steps data. One critical feature of our proposed framework is that the Neural-PDE is able to simultaneously learn and simulate the multiscale variables.We test the Neural-PDE by a range of examples from one-dimensional PDEs to a high-dimensional and nonlinear complex fluids model. The results show that the Neural-PDE is capable of learning the initial conditions, boundary conditions and differential operators without the knowledge of the specific form of a PDE system.In our experiments the Neural-PDE can efficiently extract the dynamics within 20 epochs training, and produces accurate predictions. Furthermore, unlike the traditional machine learning approaches in learning PDE such as CNN and MLP which require vast parameters for model precision, Neural-PDE shares parameters across all time steps, thus considerably reduces the computational complexity and leads to a fast learning algorithm.

preprint2022arXiv

Optimal Bayesian estimation of Gaussian mixtures with growing number of components

We study Bayesian estimation of finite mixture models in a general setup where the number of components is unknown and allowed to grow with the sample size. An assumption on growing number of components is a natural one as the degree of heterogeneity present in the sample can grow and new components can arise as sample size increases, allowing full flexibility in modeling the complexity of data. This however will lead to a high-dimensional model which poses great challenges for estimation. We novelly employ the idea of a sample size dependent prior in a Bayesian model and establish a number of important theoretical results. We first show that under mild conditions on the prior, the posterior distribution concentrates around the true mixing distribution at a near optimal rate with respect to the Wasserstein distance. Under a separation condition on the true mixing distribution, we further show that a better and adaptive convergence rate can be achieved, and the number of components can be consistently estimated. Furthermore, we derive optimal convergence rates for the higher-order mixture models where the number of components diverges arbitrarily fast. In addition, we suggest a simple recipe for using Dirichlet process (DP) mixture prior for estimating the finite mixture models and provide theoretical guarantees. In particular, we provide a novel solution for adopting the number of clusters in a DP mixture model as an estimate of the number of components in a finite mixture model. Simulation study and real data applications are carried out demonstrating the utilities of our method.

preprint2022arXiv

Robustness against Adversarial Attacks in Neural Networks using Incremental Dissipativity

Adversarial examples can easily degrade the classification performance in neural networks. Empirical methods for promoting robustness to such examples have been proposed, but often lack both analytical insights and formal guarantees. Recently, some robustness certificates have appeared in the literature based on system theoretic notions. This work proposes an incremental dissipativity-based robustness certificate for neural networks in the form of a linear matrix inequality for each layer. We also propose an equivalent spectral norm bound for this certificate which is scalable to neural networks with multiple layers. We demonstrate the improved performance against adversarial attacks on a feed-forward neural network trained on MNIST and an Alexnet trained using CIFAR-10.

preprint2021arXiv

Bayesian classification, anomaly detection, and survival analysis using network inputs with application to the microbiome

While the study of a single network is well-established, technological advances now allow for the collection of multiple networks with relative ease. Increasingly, anywhere from several to thousands of networks can be created from brain imaging, gene co-expression data, or microbiome measurements. And these networks, in turn, are being looked to as potentially powerful features to be used in modeling. However, with networks being non-Euclidean in nature, how best to incorporate them into standard modeling tasks is not obvious. In this paper, we propose a Bayesian modeling framework that provides a unified approach to binary classification, anomaly detection, and survival analysis with network inputs. We encode the networks in the kernel of a Gaussian process prior via their pairwise differences and we discuss several choices of provably positive definite kernel that can be plugged into our models. Although our methods are widely applicable, we are motivated here in particular by microbiome research (where network analysis is emerging as the standard approach for capturing the interconnectedness of microbial taxa across both time and space) and its potential for reducing preterm delivery and improving personalization of prenatal care.

preprint2020arXiv

Bayesian High-dimensional Semi-parametric Inference beyond sub-Gaussian Errors

We consider a sparse linear regression model with unknown symmetric error under the high-dimensional setting. The true error distribution is assumed to belong to the locally $β$-Hölder class with an exponentially decreasing tail, which does not need to be sub-Gaussian. We obtain posterior convergence rates of the regression coefficient and the error density, which are nearly optimal and adaptive to the unknown sparsity level. Furthermore, we derive the semi-parametric Bernstein-von Mises (BvM) theorem to characterize asymptotic shape of the marginal posterior for regression coefficients. Under the sub-Gaussianity assumption on the true score function, strong model selection consistency for regression coefficients are also obtained, which eventually asserts the frequentist's validity of credible sets.

preprint2020arXiv

Maximum Pairwise Bayes Factors for Covariance Structure Testing

Hypothesis testing of structure in covariance matrices is of significant importance, but faces great challenges in high-dimensional settings. Although consistent frequentist one-sample covariance tests have been proposed, there is a lack of simple, computationally scalable, and theoretically sound Bayesian testing methods for large covariance matrices. Motivated by this gap and by the need for tests that are powerful against sparse alternatives, we propose a novel testing framework based on the maximum pairwise Bayes factor. Our initial focus is on one-sample covariance testing; the proposed test can {\it optimally} distinguish null and alternative hypotheses in a frequentist asymptotic sense. We then propose diagonal tests and a scalable covariance graph selection procedure that are shown to be consistent. A simulation study evaluates the proposed approach relative to competitors. We illustrate advantages of our graph selection method on a gene expression data set.

preprint2020arXiv

Optimization of Graph Neural Networks with Natural Gradient Descent

In this work, we propose to employ information-geometric tools to optimize a graph neural network architecture such as the graph convolutional networks. More specifically, we develop optimization algorithms for the graph-based semi-supervised learning by employing the natural gradient information in the optimization process. This allows us to efficiently exploit the geometry of the underlying statistical model or parameter space for optimization and inference. To the best of our knowledge, this is the first work that has utilized the natural gradient for the optimization of graph neural networks that can be extended to other semi-supervised problems. Efficient computations algorithms are developed and extensive numerical studies are conducted to demonstrate the superior performance of our algorithms over existing algorithms such as ADAM and SGD.

preprint2020arXiv

Robust Optimization and Inference on Manifolds

We propose a robust and scalable procedure for general optimization and inference problems on manifolds leveraging the classical idea of `median-of-means' estimation. This is motivated by ubiquitous examples and applications in modern data science in which a statistical learning problem can be cast as an optimization problem over manifolds. Being able to incorporate the underlying geometry for inference while addressing the need for robustness and scalability presents great challenges. We address these challenges by first proving a key lemma that characterizes some crucial properties of geometric medians on manifolds. In turn, this allows us to prove robustness and tighter concentration of our proposed final estimator in a subsequent theorem. This estimator aggregates a collection of subset estimators by taking their geometric median over the manifold. We illustrate bounds on this estimator via calculations in explicit examples. The robustness and scalability of the procedure is illustrated in numerical examples on both simulated and real data sets.

preprint2016arXiv

Omnibus CLTs for Fréchet means and nonparametric inference on non-Euclidean spaces

Two central limit theorems for sample Fréchet means are derived, both significant for nonparametric inference on non-Euclidean spaces. The first one, Theorem 2.2, encompasses and improves upon most earlier CLTs on Fréchet means and broadens the scope of the methodology beyond manifolds to diverse new non-Euclidean data including those on certain stratified spaces which are important in the study of phylogenetic trees. It does not require that the underlying distribution $Q$ have a density, and applies to both intrinsic and extrinsic analysis. The second theorem, Theorem 3.3, focuses on intrinsic means on Riemannian manifolds of dimensions $d>2$ and breaks new ground by providing a broad CLT without any of the earlier restrictive support assumptions. It makes the statistically reasonable assumption of a somewhat smooth density of $Q$. The excluded case of dimension $d=2$ proves to be an enigma, although the first theorem does provide a CLT in this case as well under a support restriction. Theorem 3.3 immediately applies to spheres $S^d$, $d>2$, which are also of considerable importance in applications to axial spaces and to landmarks based image analysis, as these spaces are quotients of spheres under a Lie group $\mathcal G $ of isometries of $S^d$.

preprint2016arXiv

Robust and Scalable Bayes via a Median of Subset Posterior Measures

We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed aggregation step, which is based on the evaluation of a median in the space of probability measures equipped with a suitable collection of distances that can be quickly and efficiently evaluated in practice. We present both theoretical and numerical evidence illustrating the improvements achieved by our method.

preprint2016arXiv

Scale and curvature effects in principal geodesic analysis

There is growing interest in using the close connection between differential geometry and statistics to model smooth manifold-valued data. In particular, much work has been done recently to generalize principal component analysis (PCA), the method of dimension reduction in linear spaces, to Riemannian manifolds. One such generalization is known as principal geodesic analysis (PGA). This paper, in a novel fashion, obtains Taylor expansions in scaling parameters introduced in the domain of objective functions in PGA. It is shown this technique not only leads to better closed-form approximations of PGA but also reveals the effects that scale, curvature and the distribution of data have on solutions to PGA and on their differences to first-order tangent space approximations. This approach should be able to be applied not only to PGA but also to other generalizations of PCA and more generally to other intrinsic statistics on Riemannian manifolds.

preprint2015arXiv

Data augmentation for models based on rejection sampling

We present a data augmentation scheme to perform Markov chain Monte Carlo inference for models where data generation involves a rejection sampling algorithm. Our idea, which seems to be missing in the literature, is a simple scheme to instantiate the rejected proposals preceding each data point. The resulting joint probability over observed and rejected variables can be much simpler than the marginal distribution over the observed variables, which often involves intractable integrals. We consider three problems, the first being the modeling of flow-cytometry measurements subject to truncation. The second is a Bayesian analysis of the matrix Langevin distribution on the Stiefel manifold, and the third, Bayesian inference for a nonparametric Gaussian process density model. The latter two are instances of problems where Markov chain Monte Carlo inference is doubly-intractable. Our experiments demonstrate superior performance over state-of-the-art sampling algorithms for such problems.

preprint2015arXiv

Extrinsic local regression on manifold-valued data

We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our approach embeds the manifold where the responses lie onto a higher dimensional Euclidean space, obtains a local regression estimate in that space, and then projects this estimate back onto the image of the manifold. Outside the regression setting both intrinsic and extrinsic approaches have been proposed for modeling i.i.d manifold-valued data. However, to our knowledge our work is the first to take an extrinsic approach to the regression problem. The proposed extrinsic regression framework is general, computationally efficient and theoretically appealing. Asymptotic distributions and convergence rates of the extrinsic regression estimates are derived and a large class of examples are considered indicating the wide applicability of our approach.

preprint2015arXiv

Learning Subspaces of Different Dimension

We introduce a Bayesian model for inferring mixtures of subspaces of different dimensions. The key challenge in such a mixture model is specification of prior distributions over subspaces of different dimensions. We address this challenge by embedding subspaces or Grassmann manifolds into a sphere of relatively low dimension and specifying priors on the sphere. We provide an efficient sampling algorithm for the posterior distribution of the model parameters. We illustrate that a simple extension of our mixture of subspaces model can be applied to topic modeling. We also prove posterior consistency for the mixture of subspaces model. The utility of our approach is demonstrated with applications to real and simulated data.

preprint2014arXiv

Bayesian nonparametric inference on the Stiefel manifold

The Stiefel manifold $V_{p,d}$ is the space of all $d \times p$ orthonormal matrices, with the $d-1$ hypersphere and the space of all orthogonal matrices constituting special cases. In modeling data lying on the Stiefel manifold, parametric distributions such as the matrix Langevin distribution are often used; however, model misspecification is a concern and it is desirable to have nonparametric alternatives. Current nonparametric methods are Fréchet mean based. We take a fully generative nonparametric approach, which relies on mixing parametric kernels such as the matrix Langevin. The proposed kernel mixtures can approximate a large class of distributions on the Stiefel manifold, and we develop theory showing posterior consistency. While there exists work developing general posterior consistency results, extending these results to this particular manifold requires substantial new theory. Posterior inference is illustrated on a real-world dataset of near-Earth objects.

preprint2013arXiv

Bayesian Monotone Regression using Gaussian Process Projection

Shape constrained regression analysis has applications in dose-response modeling, environmental risk assessment, disease screening and many other areas. Incorporating the shape constraints can improve estimation efficiency and avoid implausible results. We propose two novel methods focusing on Bayesian monotone curve and surface estimation using Gaussian process projections. The first projects samples from an unconstrained prior, while the second projects samples from the Gaussian process posterior. Theory is developed on continuity of the projection, posterior consistency and rates of contraction. The second approach is shown to have an empirical Bayes justification and to lead to simple computation with good performance in finite samples. Our projection approach can be applied in other constrained function estimation problems including in multivariate settings.

Lizhen Lin

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces

Extrinsic Bayesian Optimizations on Manifolds

Multilevel Network Item Response Modeling for Discovering Differences Between Innovation and Regular School Systems in Korea

Neural-PDE: A RNN based neural network for solving time dependent PDEs

Optimal Bayesian estimation of Gaussian mixtures with growing number of components

Robustness against Adversarial Attacks in Neural Networks using Incremental Dissipativity

Bayesian classification, anomaly detection, and survival analysis using network inputs with application to the microbiome

Bayesian High-dimensional Semi-parametric Inference beyond sub-Gaussian Errors

Maximum Pairwise Bayes Factors for Covariance Structure Testing

Optimization of Graph Neural Networks with Natural Gradient Descent

Robust Optimization and Inference on Manifolds

Omnibus CLTs for Fréchet means and nonparametric inference on non-Euclidean spaces

Robust and Scalable Bayes via a Median of Subset Posterior Measures

Scale and curvature effects in principal geodesic analysis

Data augmentation for models based on rejection sampling

Extrinsic local regression on manifold-valued data

Learning Subspaces of Different Dimension

Bayesian nonparametric inference on the Stiefel manifold

Bayesian Monotone Regression using Gaussian Process Projection