Source author record

Xiao Fang

Xiao Fang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR math.ST Statistics Theory Machine Learning astro-ph.CO Social and Information Networks Artificial Intelligence astro-ph.GA astro-ph.IM physics.soc-ph astro-ph.SR math.CO nucl-ex physics.ins-det

Catalog footprint

What is connected

30works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

Vertical Federated learning (VFL) is a promising paradigm for predictive analytics, empowering an organization (i.e., task party) to enhance its predictive models through collaborations with multiple data suppliers (i.e., data parties) in a decentralized and privacy-preserving way. Despite the fast-growing interest in VFL, the lack of effective and secure tools for assessing the value of data owned by data parties hinders the application of VFL in business contexts. In response, we propose FedValue, a privacy-preserving, task-specific but model-free data valuation method for VFL, which consists of a data valuation metric and a federated computation method. Specifically, we first introduce a novel data valuation metric, namely MShapley-CMI. The metric evaluates a data party's contribution to a predictive analytics task without the need of executing a machine learning model, making it well-suited for real-world applications of VFL. Next, we develop an innovative federated computation method that calculates the MShapley-CMI value for each data party in a privacy-preserving manner. Extensive experiments conducted on six public datasets validate the efficacy of FedValue for data valuation in the context of VFL. In addition, we illustrate the practical utility of FedValue with a case study involving federated movie recommendations.

preprint2024arXiv

Second-order Approximation of Exponential Random Graph Models

Exponential random graph models (ERGMs) are flexible probability models allowing edge dependency. However, it is known that, to a first-order approximation, many ERGMs behave like Erdös-Rényi random graphs, where edges are independent. In this paper, to distinguish ERGMs from Erdös-Rényi random graphs, we consider second-order approximations of ERGMs using two-stars and triangles. We prove that the second-order approximation indeed achieves second-order accuracy in the triangle-free case. The new approximation is formally obtained by Hoeffding decomposition and rigorously justified using Stein's method.

preprint2022arXiv

Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou

Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business. It is widely used in a variety of business scenarios to make operational decisions when acquiring new customers. Modeling LTV is a challenging problem, due to its complex and mutable data distribution. Existing approaches either directly learn from posterior feature distributions or leverage statistical models that make strong assumption on prior distributions, both of which fail to capture those mutable distributions. In this paper, we propose a complete set of industrial-level LTV modeling solutions. Specifically, we introduce an Order Dependency Monotonic Network (ODMN) that models the ordered dependencies between LTVs of different time spans, which greatly improves model performance. We further introduce a Multi Distribution Multi Experts (MDME) module based on the Divide-and-Conquer idea, which transforms the severely imbalanced distribution modeling problem into a series of relatively balanced sub-distribution modeling problems hence greatly reduces the modeling complexity. In addition, a novel evaluation metric Mutual Gini is introduced to better measure the distribution difference between the estimated value and the ground-truth label based on the Lorenz Curve. The ODMN framework has been successfully deployed in many business scenarios of Kuaishou, and achieved great performance. Extensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to state-of-the-art baselines including ZILN and Two-Stage XGBoost models.

preprint2022arXiv

Cosmology with the Roman Space Telescope -- Synergies with CMB lensing

We explore synergies between the Nancy Grace Roman Space Telescope and CMB lensing data to constrain dark energy and modified gravity scenarios. A simulated likelihood analysis of the galaxy clustering and weak lensing data from the Roman Space Telescope High Latitude Survey combined with CMB lensing data from the Simons Observatory is undertaken, marginalizing over important astrophysical effects and calibration uncertainties. Included in the modeling are the effects of baryons on small-scale clustering, scale-dependent growth suppression by neutrinos, as well as uncertainties in the galaxy clustering biases, in the intrinsic alignment contributions to the lensing signal, in the redshift distributions, and in the galaxy shape calibration. The addition of CMB lensing roughly doubles the dark energy figure-of-merit from Roman photometric survey data alone, varying from a factor of 1.7 to 2.4 improvement depending on the particular Roman survey configuration. Alternatively, the inclusion of CMB lensing information can compensate for uncertainties in the Roman galaxy shape calibration if it falls below the design goals. Furthermore, we report the first forecast of Roman constraints on a model-independent structure growth, parameterized by $σ_8 (z)$, and on the Hu-Sawicki f(R) gravity as well as an improved forecast of the phenomenological $(Σ_0,μ_0)$ model. We find that CMB lensing plays a crucial role in constraining $σ_8(z)$ at z>2, with percent-level constraints forecasted out to z=4. CMB lensing information does not improve constraints on the f(R) models substantially. It does, however, increase the $(Σ_0,μ_0)$ figure-of-merit by a factor of about 1.5.

preprint2022arXiv

Exploiting Expert Knowledge for Assigning Firms to Industries: A Novel Deep Learning Method

Industry assignment, which assigns firms to industries according to a predefined Industry Classification System (ICS), is fundamental to a large number of critical business practices, ranging from operations and strategic decision making by firms to economic analyses by government agencies. Three types of expert knowledge are essential to effective industry assignment: definition-based knowledge (i.e., expert definitions of each industry), structure-based knowledge (i.e., structural relationships among industries as specified in an ICS), and assignment-based knowledge (i.e., prior firm-industry assignments performed by domain experts). Existing industry assignment methods utilize only assignment-based knowledge to learn a model that classifies unassigned firms to industries, and overlook definition-based and structure-based knowledge. Moreover, these methods only consider which industry a firm has been assigned to, but ignore the time-specificity of assignment-based knowledge, i.e., when the assignment occurs. To address the limitations of existing methods, we propose a novel deep learning-based method that not only seamlessly integrates the three types of knowledge for industry assignment but also takes the time-specificity of assignment-based knowledge into account. Methodologically, our method features two innovations: dynamic industry representation and hierarchical assignment. The former represents an industry as a sequence of time-specific vectors by integrating the three types of knowledge through our proposed temporal and spatial aggregation mechanisms. The latter takes industry and firm representations as inputs, computes the probability of assigning a firm to different industries, and assigns the firm to the industry with the highest probability.

preprint2022arXiv

From $p$-Wasserstein Bounds to Moderate Deviations

We use a new method via $p$-Wasserstein bounds to prove Cramér-type moderate deviations in (multivariate) normal approximations. In the classical setting that $W$ is a standardized sum of $n$ independent and identically distributed (i.i.d.) random variables with sub-exponential tails, our method recovers the optimal range of $0\leq x=o(n^{1/6})$ and the near optimal error rate $O(1)(1+x)(\log n+x^2)/\sqrt{n}$ for $P(W>x)/(1-Φ(x))\to 1$, where $Φ$ is the standard normal distribution function. Our method also works for dependent random variables (vectors) and we give applications to the combinatorial central limit theorem, Wiener chaos, homogeneous sums and local dependence. The key step of our method is to show that the $p$-Wasserstein distance between the distribution of the random variable (vector) of interest and a normal distribution grows like $O(p^αΔ)$, $1\leq p\leq p_0$, for some constants $α, Δ$ and $p_0$. In the above i.i.d. setting, $α=1, Δ=1/\sqrt{n}, p_0=n^{1/3}$. For this purpose, we obtain general $p$-Wasserstein bounds in (multivariate) normal approximations using Stein's method.

preprint2022arXiv

High order steady-state diffusion approximations

We derive and analyze new diffusion approximations of stationary distributions of Markov chains that are based on second- and higher-order terms in the expansion of the Markov chain generator. Our approximations achieve a higher degree of accuracy compared to diffusion approximations widely used for the past fifty years, while retaining a similar computational complexity. To support our approximations, we present a combination of theoretical and numerical results across three different models. Our approximations are derived recursively through Stein/Poisson equations, and the theoretical results are proved using Stein's method.

preprint2022arXiv

High-dimensional properties for empirical priors in linear regression with unknown error variance

We study full Bayesian procedures for high-dimensional linear regression. We adopt data-dependent empirical priors introduced in [1]. In their paper, these priors have nice posterior contraction properties and are easy to compute. Our paper extend their theoretical results to the case of unknown error variance . Under proper sparsity assumption, we achieve model selection consistency, posterior contraction rates as well as Bernstein von-Mises theorem by analyzing multivariate t-distribution.

preprint2022arXiv

Posterior Consistency for Bayesian Relevance Vector Machines

Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. Chakraborty et al. (2012) did a full hierarchical Bayesian analysis of nonlinear regression in such situations using relevance vector machines based on reproducing kernel Hilbert space (RKHS). But they did not provide any theoretical properties associated with their procedure. The present paper revisits their problem, introduces a new class of global-local priors different from theirs, and provides results on posterior consistency as well as posterior contraction rates

preprint2021arXiv

2D-FFTLog: Efficient computation of real space covariance matrices for galaxy clustering and weak lensing

Accurate covariance matrices for two-point functions are critical for inferring cosmological parameters in likelihood analyses of large-scale structure surveys. Among various approaches to obtaining the covariance, analytic computation is much faster and less noisy than estimation from data or simulations. However, the transform of covariances from Fourier space to real space involves integrals with two Bessel integrals, which are numerically slow and easily affected by numerical uncertainties. Inaccurate covariances may lead to significant errors in the inference of the cosmological parameters. In this paper, we introduce a 2D-FFTLog algorithm for efficient, accurate and numerically stable computation of non-Gaussian real space covariances for both 3D and projected statistics. The 2D-FFTLog algorithm is easily extended to perform real space bin-averaging. We apply the algorithm to the covariances for galaxy clustering and weak lensing for a Dark Energy Survey Year 3-like and a Rubin Observatory's Legacy Survey of Space and Time Year 1-like survey, and demonstrate that for both surveys, our algorithm can produce numerically stable angular bin-averaged covariances with the flat sky approximation, which are sufficiently accurate for inferring cosmological parameters. The code CosmoCov for computing the real space covariances with or without the flat sky approximation is released along with this paper.

preprint2021arXiv

Large-dimensional Central Limit Theorem with Fourth-moment Error Bounds on Convex Sets and Balls

We prove the large-dimensional Gaussian approximation of a sum of $n$ independent random vectors in $\mathbb{R}^d$ together with fourth-moment error bounds on convex sets and Euclidean balls. We show that compared with classical third-moment bounds, our bounds have near-optimal dependence on $n$ and can achieve improved dependence on the dimension $d$. For centered balls, we obtain an additional error bound that has a sub-optimal dependence on $n$, but recovers the known result of the validity of the Gaussian approximation if and only if $d=o(n)$. We discuss an application to the bootstrap. We prove our main results using Stein's method.

preprint2020arXiv

Arcsine laws for random walks generated from random permutations with applications to genomics

A classical result for the simple symmetric random walk with $2n$ steps is that the number of steps above the origin, the time of the last visit to the origin, and the time of the maximum height all have exactly the same distribution and converge when scaled to the arcsine law. Motivated by applications in genomics, we study the distributions of these statistics for the non-Markovian random walk generated from the ascents and descents of a uniform random permutation and a Mallows($q$) permutation and show that they have the same asymptotic distributions as for the simple random walk. We also give an unexpected conjecture, along with numerical evidence and a partial proof in special cases, for the result that the number of steps above the origin by step $2n$ for the uniform permutation generated walk has exactly the same discrete arcsine distribution as for the simple random walk, even though the other statistics for these walks have very different laws. We also give explicit error bounds to the limit theorems using Stein's method for the arcsine distribution, as well as functional central limit theorems and a strong embedding of the Mallows$(q)$ permutation which is of independent interest.

preprint2020arXiv

Beyond Limber: Efficient computation of angular power spectra for galaxy clustering and weak lensing

Angular two-point statistics of large-scale structure observables are important cosmological probes. To reach the high accuracy required by the statistical precision of future surveys, some of these statistics may need to be computed without the commonly employed Limber approximation; the exact computation however requires integration over Bessel functions, and a brute-force evaluation is slow to converge. We present a new method based on our generalized FFTLog algorithm for the efficient computation of angular power spectra beyond the Limber approximation. The new method significantly simplifies the calculation and improves the numerical speed and stability. It is easily extended to handle integrals involving derivatives of Bessel functions, making it equally applicable to numerically more challenging cases such as contributions from redshift-space distortions and Doppler effects. We implement our method for galaxy clustering and galaxy-galaxy lensing power spectra. We find that using the Limber approximation for galaxy clustering in future analyses like LSST Year 1 and DES Year 6 may cause significant biases in cosmological parameters, indicating that going beyond the Limber approximation is necessary for these analyses.

preprint2020arXiv

High-dimensional Central Limit Theorems by Stein's Method

We obtain explicit error bounds for the $d$-dimensional normal approximation on hyperrectangles for a random vector that has a Stein kernel, or admits an exchangeable pair coupling, or is a non-linear statistic of independent random variables or a sum of $n$ locally dependent random vectors. We assume the approximating normal distribution has a non-singular covariance matrix. The error bounds vanish even when the dimension $d$ is much larger than the sample size $n$. We prove our main results using the approach of Götze (1991) in Stein's method, together with modifications of an estimate of Anderson, Hall and Titterington (1998) and a smoothing inequality of Bhattacharya and Rao (1976). For sums of $n$ independent and identically distributed isotropic random vectors having a log-concave density, we obtain an error bound that is optimal up to a $\log n$ factor. We also discuss an application to multiple Wiener-Itô integrals.

preprint2020arXiv

New error bounds in multivariate normal approximations via exchangeable pairs with applications to Wishart matrices and fourth moment theorems

We extend Stein's celebrated Wasserstein bound for normal approximation via exchangeable pairs to the multi-dimensional setting. As an intermediate step, we exploit the symmetry of exchangeable pairs to obtain an error bound for smooth test functions. We also obtain a continuous version of the multi-dimensional Wasserstein bound in terms of fourth moments. We apply the main results to multivariate normal approximations to Wishart matrices of size $n$ and degree $d$, where we obtain the optimal convergence rate $\sqrt{n^3/d}$ under only moment assumptions, and to quadratic forms and Poisson functionals, where we strengthen a few of the fourth moment bounds in the literature on the Wasserstein distance.

preprint2020arXiv

Normal Approximation and Fourth Moment Theorems for Monochromatic Triangles

Given a graph sequence $\{G_n\}_{n \geq 1}$ denote by $T_3(G_n)$ the number of monochromatic triangles in a uniformly random coloring of the vertices of $G_n$ with $c \geq 2$ colors. This arises as a generalization of the birthday paradox, where $G_n$ corresponds to a friendship network and $T_3(G_n)$ counts the number of triples of friends with matching birthdays. In this paper we prove a central limit theorem (CLT) for $T_3(G_n)$ with explicit error rates. The proof involves constructing a martingale difference sequence by carefully ordering the vertices of $G_n$, based on a certain combinatorial score function, and using a quantitive version of the martingale CLT. We then relate this error term to the well-known fourth moment phenomenon, which, interestingly, holds only when the number of colors $c \geq 5$. We also show that the convergence of the fourth moment is necessary to obtain a Gaussian limit for any $c \geq 2$, which, together with the above result, implies that the fourth-moment condition characterizes the limiting normal distribution of $T_3(G_n)$, whenever $c \geq 5$. Finally, to illustrate the promise of our approach, we include an alternative proof of the CLT for the number of monochromatic edges, which provides quantitative rates for the results obtained in Bhattacharya et al. (2017).

preprint2019arXiv

An efficient method for mapping the 12C+12C molecular resonances at low energies

The 12C+12C fusion reaction is famous for its complication of molecular resonances, and plays an important role in both nuclear structure and astrophysics. It is extremely difficult to measure the cross sections of 12C+12C fusions at energies of astrophysical relevance due to very low reaction yields. To measure the complicated resonant structure existing in this important reaction, an efficient thick target method has been developed and applied for the first time at energies Ec.m.<5.3 MeV. A scan of the cross sections over a relatively wide range of energies can be carried out using only a single beam energy. The result of measurement at Ec.m.= 4.1 MeV is compared with other results from previous work. This method would be useful for searching potentially existing resonances of 12C+12C in the energy range 1 MeV<Ec.m.<3 MeV.

preprint2016arXiv

Utility-based Link Recommendation for Online Social Networks

Link recommendation, which suggests links to connect currently unlinked users, is a key functionality offered by major online social networks. Salient examples of link recommendation include "People You May Know" on Facebook and LinkedIn as well as "You May Know" on Google+. The main stakeholders of an online social network include users (e.g., Facebook users) who use the network to socialize with other users and an operator (e.g., Facebook Inc.) that establishes and operates the network for its own benefit (e.g., revenue). Existing link recommendation methods recommend links that are likely to be established by users but overlook the benefit a recommended link could bring to an operator. To address this gap, we define the utility of recommending a link and formulate a new research problem - the utility-based link recommendation problem. We then propose a novel utility-based link recommendation method that recommends links based on the value, cost, and linkage likelihood of a link, in contrast to existing link recommendation methods which focus solely on linkage likelihood. Specifically, our method models the dependency relationship between value, cost, linkage likelihood and utility-based link recommendation decision using a Bayesian network, predicts the probability of recommending a link with the Bayesian network, and recommends links with the highest probabilities. Using data obtained from a major U.S. online social network, we demonstrate significant performance improvement achieved by our method compared to prevalent link recommendation methods from representative prior research.

preprint2015arXiv

A multivariate CLT for bounded decomposable random vectors with the best known rate

We prove a multivariate central limit theorem with explicit error bound on a non-smooth function distance for sums of bounded decomposable $d$-dimensional random vectors. The decomposition structure is similar to that of Barbour, Karoński and Ruciński (1989) and is more general than the local dependence structure considered in Chen and Shao (2004). The error bound is of the order $d^{\frac{1}{4}} n^{-\frac{1}{2}}$, where $d$ is the dimension and $n$ is the number of summands. The dependence on $d$, namely $d^{\frac{1}{4}}$, is the best known dependence even for sums of independent and identically distributed random vectors, and the dependence on $n$, namely $n^{-\frac{1}{2}}$, is optimal. We apply our main result to a random graph example.

preprint2015arXiv

A Survey of Link Recommendation for Social Networks: Methods, Theoretical Foundations, and Future Research Directions

Link recommendation has attracted significant attentions from both industry practitioners and academic researchers. In industry, link recommendation has become a standard and most important feature in online social networks, prominent examples of which include "People You May Know" on LinkedIn and "You May Know" on Google+. In academia, link recommendation has been and remains a highly active research area. This paper surveys state-of-the-art link recommendation methods, which can be broadly categorized into learning-based methods and proximity-based methods. We further identify social and economic theories, such as social interaction theory, that underlie these methods and explain from a theoretical perspective why a link recommendation method works. Finally, we propose to extend link recommendation research in several directions that include utility-based link recommendation, diversity of link recommendation, link recommendation from incomplete data, and experimental study of link recommendation.

preprint2015arXiv

Multivariate Normal Approximation by Stein's Method: The Concentration Inequality Approach

The concentration inequality approach for normal approximation by Stein's method is generalized to the multivariate setting. We use this approach to prove a non-smooth function distance for multivariate normal approximation for standardized sums of $k$-dimensional independent random vectors $W=\sum_{i=1}^n X_i$ with an error bound of order $k^{1/2}γ$ where $γ=\sum_{i=1}^n E|X_i|^3$. For sums of locally dependent (unbounded) random vectors, we obtain a fourth moment bound which is typically of order $O_k(1/\sqrt{n})$, as well as a third moment bound which is typically of order $O_k(\log n/\sqrt{n})$.

preprint2015arXiv

On the error bound in a combinatorial central limit theorem

Let $\mathbb{X}=\{X_{ij}: 1\le i,j\le n\}$ be an $n\times n$ array of independent random variables where $n\ge2$. Let $π$ be a uniform random permutation of $\{1,2,\dots,n\}$, independent of $\mathbb{X}$, and let $W=\sum_{i=1}^nX_{iπ(i)}$. Suppose $\mathbb{X}$ is standardized so that ${\mathbb{E}}W=0,\operatorname {Var}(W)=1$. We prove that the Kolmogorov distance between the distribution of $W$ and the standard normal distribution is bounded by $451\sum_{i,j=1}^n{\mathbb{E}}|X_{ij}|^3/n$. Our approach is by Stein's method of exchangeable pairs and the use of a concentration inequality.

preprint2015arXiv

Poisson approximation for two scan statistics with rates of convergence

As an application of Stein's method for Poisson approximation, we prove rates of convergence for the tail probabilities of two scan statistics that have been suggested for detecting local signals in sequences of independent random variables subject to possible change-points. Our formulation deals simultaneously with ordinary and with large deviations.

preprint2015arXiv

Rates of convergence for multivariate normal approximation with applications to dense graphs and doubly indexed permutation statistics

We provide a new general theorem for multivariate normal approximation on convex sets. The theorem is formulated in terms of a multivariate extension of Stein couplings. We apply the results to a homogeneity test in dense random graphs and to prove multivariate asymptotic normality for certain doubly indexed permutation statistics.

preprint2015arXiv

Scintillation efficiency measurement of Na recoils in NaI(Tl) below the DAMA/LIBRA energy threshold

The dark matter interpretation of the DAMA modulation signal depends on the NaI(Tl) scintillation efficiency of nuclear recoils. Previous measurements for Na recoils have large discrepancies, especially in the DAMA/LIBRA modulation energy region. We report a quenching effect measurement of Na recoils in NaI(Tl) from 3keV$_{\text{nr}}$ to 52keV$_{\text{nr}}$, covering the whole DAMA/LIBRA energy region for light WIMP interpretations. By using a low-energy, pulsed neutron beam, a double time-of-flight technique, and pulse-shape discrimination methods, we obtained the most accurate measurement of this kind for NaI(Tl) to date. The results differ significantly from the DAMA reported values at low energies, but fall between the other previous measurements. We present the implications of the new quenching results for the dark matter interpretation of the DAMA modulation signal.

preprint2014arXiv

A universal error bound in the CLT for counting monochromatic edges in uniformly colored graphs

Let $\{G_n: n\geq 1\}$ be a sequence of simple graphs. Suppose $G_n$ has $m_n$ edges and each vertex of $G_n$ is colored independently and uniformly at random with $c_n$ colors. Recently, Bhattacharya, Diaconis and Mukherjee (2013) proved universal limit theorems for the number of monochromatic edges in $G_n$. Their proof was by the method of moments, and therefore was not able to produce rates of convergence. By a non-trivial application of Stein's method, we prove that there exists a universal error bound for their central limit theorem. The error bound depends only on $m_n$ and $c_n$, regardless of the graph structure.

preprint2014arXiv

Discretized normal approximation by Stein's method

We prove a general theorem to bound the total variation distance between the distribution of an integer valued random variable of interest and an appropriate discretized normal distribution. We apply the theorem to 2-runs in a sequence of i.i.d. Bernoulli random variables, the number of vertices with a given degree in the Erdös-Rényi random graph, and the uniform multinomial occupancy model.

preprint2013arXiv

From Stein identities to moderate deviations

Stein's method is applied to obtain a general Cramer-type moderate deviation result for dependent random variables whose dependence is defined in terms of a Stein identity. A corollary for zero-bias coupling is deduced. The result is also applied to a combinatorial central limit theorem, a general system of binary codes, the anti-voter model on a complete graph, and the Curie-Weiss model. A general moderate deviation result for independent random variables is also proved.

preprint2013arXiv

Moderate deviations in Poisson approximation: a first attempt

Poisson approximation using Stein's method has been extensively studied in the literature. The main focus has been on bounding the total variation distance. This paper is a first attempt on moderate deviations in Poisson approximation for right-tail probabilities of sums of dependent indicators. We obtain results under certain general conditions for local dependence as well as for size-bias coupling. These results are then applied to independent indicators, 2-runs, and the matching problem.

preprint2013arXiv

Predicting Adoption Probabilities in Social Networks

In a social network, adoption probability refers to the probability that a social entity will adopt a product, service, or opinion in the foreseeable future. Such probabilities are central to fundamental issues in social network analysis, including the influence maximization problem. In practice, adoption probabilities have significant implications for applications ranging from social network-based target marketing to political campaigns; yet, predicting adoption probabilities has not received sufficient research attention. Building on relevant social network theories, we identify and operationalize key factors that affect adoption decisions: social influence, structural equivalence, entity similarity, and confounding factors. We then develop the locally-weighted expectation-maximization method for Naïve Bayesian learning to predict adoption probabilities on the basis of these factors. The principal challenge addressed in this study is how to predict adoption probabilities in the presence of confounding factors that are generally unobserved. Using data from two large-scale social networks, we demonstrate the effectiveness of the proposed method. The empirical results also suggest that cascade methods primarily using social influence to predict adoption probabilities offer limited predictive power, and that confounding factors are critical to adoption probability predictions.

Xiao Fang

What is connected

Connect this record

See the researcher in context

Building this map preview

30 published item(s)

Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

Second-order Approximation of Exponential Random Graph Models

Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou

Cosmology with the Roman Space Telescope -- Synergies with CMB lensing

Exploiting Expert Knowledge for Assigning Firms to Industries: A Novel Deep Learning Method

From $p$-Wasserstein Bounds to Moderate Deviations

High order steady-state diffusion approximations

High-dimensional properties for empirical priors in linear regression with unknown error variance

Posterior Consistency for Bayesian Relevance Vector Machines

2D-FFTLog: Efficient computation of real space covariance matrices for galaxy clustering and weak lensing

Large-dimensional Central Limit Theorem with Fourth-moment Error Bounds on Convex Sets and Balls

Arcsine laws for random walks generated from random permutations with applications to genomics

Beyond Limber: Efficient computation of angular power spectra for galaxy clustering and weak lensing

High-dimensional Central Limit Theorems by Stein's Method

New error bounds in multivariate normal approximations via exchangeable pairs with applications to Wishart matrices and fourth moment theorems

Normal Approximation and Fourth Moment Theorems for Monochromatic Triangles

An efficient method for mapping the 12C+12C molecular resonances at low energies

Utility-based Link Recommendation for Online Social Networks

A multivariate CLT for bounded decomposable random vectors with the best known rate

A Survey of Link Recommendation for Social Networks: Methods, Theoretical Foundations, and Future Research Directions

Multivariate Normal Approximation by Stein's Method: The Concentration Inequality Approach

On the error bound in a combinatorial central limit theorem

Poisson approximation for two scan statistics with rates of convergence

Rates of convergence for multivariate normal approximation with applications to dense graphs and doubly indexed permutation statistics

Scintillation efficiency measurement of Na recoils in NaI(Tl) below the DAMA/LIBRA energy threshold

A universal error bound in the CLT for counting monochromatic edges in uniformly colored graphs

Discretized normal approximation by Stein's method

From Stein identities to moderate deviations

Moderate deviations in Poisson approximation: a first attempt

Predicting Adoption Probabilities in Social Networks