Researcher profile

Lizhen Lin

Lizhen Lin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces

We study posterior contraction rates for sparse Bayesian Kolmogorov-Arnold networks (KANs) over anisotropic Besov spaces, providing a statistical foundation of KANs from a Bayesian point of view. We show that sparse Bayesian KANs equipped with spike-and-slab-type sparsity priors attain the near-minimax posterior contraction. In particular, the contraction rate depends on the intrinsic anisotropic smoothness of the underlying function. Moreover, by placing a hyperprior on a single model-size parameter, the resulting posterior adapts to unknown anisotropic smoothness and still achieves the corresponding near-minimax rate. A distinctive feature of our results, compared with those for standard sparse MLP-based models, is that the KAN depth can be kept fixed: owing to the flexibility of learnable spline edge functions, the required approximation complexity is controlled through the network width, spline-grid range and size, and parameter sparsity. Our analysis develops theoretical tools tailored to sparse spline-edge architectures, including approximation and complexity bounds for Bayesian KANs. We then extend to compositional Besov spaces and show that the contraction rates depend on layerwise smoothness and effective dimension of the underlying compositional structure, thereby effectively avoiding the curse of dimensionality. Together, the developed tools and findings advance the theoretical understanding of Bayesian neural networks and provide rigorous statistical foundations for KANs.

preprint2022arXiv

Extrinsic Bayesian Optimizations on Manifolds

We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and quantify the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Gaussian process, which guides the search in the optimization process. The critical challenge for designing Bayesian optimization algorithms on manifolds lies in the difficulty of constructing valid covariance kernels for Gaussian processes on general manifolds. Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensional Euclidean space via equivariant embeddings and then constructing a valid covariance kernel on the image manifold after the embedding. This leads to efficient and scalable algorithms for optimization over complex manifolds. Simulation study and real data analysis are carried out to demonstrate the utilities of our eBO framework by applying the eBO to various optimization problems over manifolds such as the sphere, the Grassmannian, and the manifold of positive definite matrices.

preprint2022arXiv

Multilevel Network Item Response Modeling for Discovering Differences Between Innovation and Regular School Systems in Korea

The innovation school system in South Korea has been developed in response to the traditional high-pressure school system in South Korea, with a view to cultivating a bottom-up and student-centered educational culture. Despite its ambitious goals, questions have been raised about the success of the innovation school system. Leveraging data from the Gyeonggi Education Panel Study (GEPS) along with advances in the statistical analysis of network data and educational data, we compare the two school systems in more depth. We find that some schools are indeed different from others, and those differences are not detected by conventional multilevel models. Having said that, we do not find much evidence that the innovation school system differs from the regular school system in terms of self-reported mental well-being, although we do detect differences among some schools that appear to be unrelated to the school system.

preprint2022arXiv

Neural-PDE: A RNN based neural network for solving time dependent PDEs

Partial differential equations (PDEs) play a crucial role in studying a vast number of problems in science and engineering. Numerically solving nonlinear and/or high-dimensional PDEs is often a challenging task. Inspired by the traditional finite difference and finite elements methods and emerging advancements in machine learning, we propose a sequence deep learning framework called Neural-PDE, which allows to automatically learn governing rules of any time-dependent PDE system from existing data by using a bidirectional LSTM encoder, and predict the next n time steps data. One critical feature of our proposed framework is that the Neural-PDE is able to simultaneously learn and simulate the multiscale variables.We test the Neural-PDE by a range of examples from one-dimensional PDEs to a high-dimensional and nonlinear complex fluids model. The results show that the Neural-PDE is capable of learning the initial conditions, boundary conditions and differential operators without the knowledge of the specific form of a PDE system.In our experiments the Neural-PDE can efficiently extract the dynamics within 20 epochs training, and produces accurate predictions. Furthermore, unlike the traditional machine learning approaches in learning PDE such as CNN and MLP which require vast parameters for model precision, Neural-PDE shares parameters across all time steps, thus considerably reduces the computational complexity and leads to a fast learning algorithm.

preprint2022arXiv

Optimal Bayesian estimation of Gaussian mixtures with growing number of components

We study Bayesian estimation of finite mixture models in a general setup where the number of components is unknown and allowed to grow with the sample size. An assumption on growing number of components is a natural one as the degree of heterogeneity present in the sample can grow and new components can arise as sample size increases, allowing full flexibility in modeling the complexity of data. This however will lead to a high-dimensional model which poses great challenges for estimation. We novelly employ the idea of a sample size dependent prior in a Bayesian model and establish a number of important theoretical results. We first show that under mild conditions on the prior, the posterior distribution concentrates around the true mixing distribution at a near optimal rate with respect to the Wasserstein distance. Under a separation condition on the true mixing distribution, we further show that a better and adaptive convergence rate can be achieved, and the number of components can be consistently estimated. Furthermore, we derive optimal convergence rates for the higher-order mixture models where the number of components diverges arbitrarily fast. In addition, we suggest a simple recipe for using Dirichlet process (DP) mixture prior for estimating the finite mixture models and provide theoretical guarantees. In particular, we provide a novel solution for adopting the number of clusters in a DP mixture model as an estimate of the number of components in a finite mixture model. Simulation study and real data applications are carried out demonstrating the utilities of our method.

preprint2022arXiv

Robustness against Adversarial Attacks in Neural Networks using Incremental Dissipativity

Adversarial examples can easily degrade the classification performance in neural networks. Empirical methods for promoting robustness to such examples have been proposed, but often lack both analytical insights and formal guarantees. Recently, some robustness certificates have appeared in the literature based on system theoretic notions. This work proposes an incremental dissipativity-based robustness certificate for neural networks in the form of a linear matrix inequality for each layer. We also propose an equivalent spectral norm bound for this certificate which is scalable to neural networks with multiple layers. We demonstrate the improved performance against adversarial attacks on a feed-forward neural network trained on MNIST and an Alexnet trained using CIFAR-10.

preprint2021arXiv

Bayesian classification, anomaly detection, and survival analysis using network inputs with application to the microbiome

While the study of a single network is well-established, technological advances now allow for the collection of multiple networks with relative ease. Increasingly, anywhere from several to thousands of networks can be created from brain imaging, gene co-expression data, or microbiome measurements. And these networks, in turn, are being looked to as potentially powerful features to be used in modeling. However, with networks being non-Euclidean in nature, how best to incorporate them into standard modeling tasks is not obvious. In this paper, we propose a Bayesian modeling framework that provides a unified approach to binary classification, anomaly detection, and survival analysis with network inputs. We encode the networks in the kernel of a Gaussian process prior via their pairwise differences and we discuss several choices of provably positive definite kernel that can be plugged into our models. Although our methods are widely applicable, we are motivated here in particular by microbiome research (where network analysis is emerging as the standard approach for capturing the interconnectedness of microbial taxa across both time and space) and its potential for reducing preterm delivery and improving personalization of prenatal care.

preprint2020arXiv

Bayesian High-dimensional Semi-parametric Inference beyond sub-Gaussian Errors

We consider a sparse linear regression model with unknown symmetric error under the high-dimensional setting. The true error distribution is assumed to belong to the locally $β$-Hölder class with an exponentially decreasing tail, which does not need to be sub-Gaussian. We obtain posterior convergence rates of the regression coefficient and the error density, which are nearly optimal and adaptive to the unknown sparsity level. Furthermore, we derive the semi-parametric Bernstein-von Mises (BvM) theorem to characterize asymptotic shape of the marginal posterior for regression coefficients. Under the sub-Gaussianity assumption on the true score function, strong model selection consistency for regression coefficients are also obtained, which eventually asserts the frequentist's validity of credible sets.

preprint2020arXiv

Maximum Pairwise Bayes Factors for Covariance Structure Testing

Hypothesis testing of structure in covariance matrices is of significant importance, but faces great challenges in high-dimensional settings. Although consistent frequentist one-sample covariance tests have been proposed, there is a lack of simple, computationally scalable, and theoretically sound Bayesian testing methods for large covariance matrices. Motivated by this gap and by the need for tests that are powerful against sparse alternatives, we propose a novel testing framework based on the maximum pairwise Bayes factor. Our initial focus is on one-sample covariance testing; the proposed test can {\it optimally} distinguish null and alternative hypotheses in a frequentist asymptotic sense. We then propose diagonal tests and a scalable covariance graph selection procedure that are shown to be consistent. A simulation study evaluates the proposed approach relative to competitors. We illustrate advantages of our graph selection method on a gene expression data set.

preprint2020arXiv

Optimization of Graph Neural Networks with Natural Gradient Descent

In this work, we propose to employ information-geometric tools to optimize a graph neural network architecture such as the graph convolutional networks. More specifically, we develop optimization algorithms for the graph-based semi-supervised learning by employing the natural gradient information in the optimization process. This allows us to efficiently exploit the geometry of the underlying statistical model or parameter space for optimization and inference. To the best of our knowledge, this is the first work that has utilized the natural gradient for the optimization of graph neural networks that can be extended to other semi-supervised problems. Efficient computations algorithms are developed and extensive numerical studies are conducted to demonstrate the superior performance of our algorithms over existing algorithms such as ADAM and SGD.

preprint2020arXiv

Robust Optimization and Inference on Manifolds

We propose a robust and scalable procedure for general optimization and inference problems on manifolds leveraging the classical idea of `median-of-means' estimation. This is motivated by ubiquitous examples and applications in modern data science in which a statistical learning problem can be cast as an optimization problem over manifolds. Being able to incorporate the underlying geometry for inference while addressing the need for robustness and scalability presents great challenges. We address these challenges by first proving a key lemma that characterizes some crucial properties of geometric medians on manifolds. In turn, this allows us to prove robustness and tighter concentration of our proposed final estimator in a subsequent theorem. This estimator aggregates a collection of subset estimators by taking their geometric median over the manifold. We illustrate bounds on this estimator via calculations in explicit examples. The robustness and scalability of the procedure is illustrated in numerical examples on both simulated and real data sets.