Researcher profile

Min-ge Xie

Min-ge Xie contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Geometric Conditions for the Discrepant Posterior Phenomenon and Connections to Simpson's Paradox

The discrepant posterior phenomenon (DPP) is a counter-intuitive phenomenon that can frequently occur in a Bayesian analysis of multivariate parameters. It refers to the phenomenon that a parameter estimate based on a posterior is more extreme than both of those inferred based on either the prior or the likelihood alone. Inferential claims that exhibit DPP defy the common intuition that the posterior is a prior-data compromise, and the phenomenon can be surprisingly ubiquitous in well-behaved Bayesian models. In this paper we revisit this phenomenon and, using point estimation as an example, derive conditions under which the DPP occurs in Bayesian models with exponential quadratic likelihoods and conjugate multivariate Gaussian priors. The family of exponential quadratic likelihood models includes Gaussian models and those models with local asymptotic normality property. We provide an intuitive geometric interpretation of the phenomenon and show that there exists a nontrivial space of marginal directions such that the DPP occurs. We further relate the phenomenon to the Simpson's paradox and discover their deep-rooted connection that is associated with marginalization. We also draw connections with Bayesian computational algorithms when difficult geometry exists. Our discovery demonstrates that DPP is more prevalent than previously understood and anticipated. Theoretical results are complemented by numerical illustrations. Scenarios covered in this study have implications for parameterization, sensitivity analysis, and prior choice for Bayesian modeling.

preprint2021arXiv

Divide-and-conquer methods for big data analysis

In the context of big data analysis, the divide-and-conquer methodology refers to a multiple-step process: first splitting a data set into several smaller ones; then analyzing each set separately; finally combining results from each analysis together. This approach is effective in handling large data sets that are unsuitable to be analyzed entirely by a single computer due to limits either from memory storage or computational time. The combined results will provide a statistical inference which is similar to the one from analyzing the entire data set. This article reviews some recently developments of divide-and-conquer methods in a variety of settings, including combining based on parametric, semiparametric and nonparametric models, online sequential updating methods, among others. Theoretical development on the efficiency of the divide-and-conquer methods is also discussed.

preprint2020arXiv

Homeostasis phenomenon in predictive inference when using a wrong learning model: a tale of random split of data into training and test sets

This note uses a conformal prediction procedure to provide further support on several points discussed by Professor Efron (Efron, 2020) concerning prediction, estimation and IID assumption. It aims to convey the following messages: (1) Under the IID (e.g., random split of training and testing data sets) assumption, prediction is indeed an easier task than estimation, since prediction has a 'homeostasis property' in this case -- Even if the model used for learning is completely wrong, the prediction results maintain valid. (2) If the IID assumption is violated (e.g., a targeted prediction on specific individuals), the homeostasis property is often disrupted and the prediction results under a wrong model are usually invalid. (3) Better model estimation typically leads to more accurate prediction in both IID and non-IID cases. Good modeling and estimation practices are important and, in many times, crucial for obtaining good prediction results. The discussion also provides one explanation why the deep learning method works so well in academic exercises (with experiments set up by randomly splitting the entire data into training and testing data sets), but fails to deliver many `killer applications' in real world applications.

preprint2020arXiv

p-Value as the Strength of Evidence Measured by Confidence Distribution

The notion of p-value is a fundamental concept in statistical inference and has been widely used for reporting outcomes of hypothesis tests. However, p-value is often misinterpreted, misused or miscommunicated in practice. Part of the issue is that existing definitions of p-value are often derived from constructions under specific settings, and a general definition that directly reflects the evidence of the null hypothesis is not yet available. In this article, we first propose a general and rigorous definition of p-value that fulfills two performance-based characteristics. The performance-based definition subsumes all existing construction-based definitions of the p-value, and justifies their interpretations. The paper further presents a specific approach based on confidence distribution to formulate and calculate p-values. This specific way of computing p values has two main advantages. First, it is applicable for a wide range of hypothesis testing problems, including the standard one- and two-sided tests, tests with interval-type null, intersection-union tests, multivariate tests and so on. Second, it can naturally lead to a coherent interpretation of p-value as evidence in support of the null hypothesis, as well as a meaningful measure of degree of such support. In particular, it places a meaning of a large p-value, e.g. p-value of 0.8 has more support than 0.5. Numerical examples are used to illustrate the wide applicability and computational feasibility of our approach. We show that our proposal is effective and can be applied broadly, without further consideration of the form/size of the null space. As for existing testing methods, the solutions have not been available or cannot be easily obtained.