Source author record

Linglong Kong

Linglong Kong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning math.ST Statistics Theory Applications Artificial Intelligence Computation

Catalog footprint

What is connected

14works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Prediction-powered Inference by Mixture of Experts

The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-supervised inference, in which labeled data are limited and expensive to obtain, whereas unlabeled data are abundant and widely available. Given a collection of predictors, we treat them as a mixture of experts (MOE) and introduce an MOE-powered semi-supervised inference framework built upon prediction-powered inference (PPI). Motivated by the variance reduction principle underlying PPI, the proposed framework seeks the mixture of experts that achieves the smallest possible variance. Compared with standard PPI, the MOE-powered inference framework adapts to the unknown performance of individual predictors, benefits from their collective predictive power, and enjoys a best-expert guarantee. The framework is flexible and applies to mean estimation, linear regression, quantile estimation, and general M-estimation. We develop non-asymptotic theory for the MOE-powered inference framework and establish upper bounds on the coverage error of the resulting confidence intervals. Numerical experiments demonstrate the practical effectiveness of MOE-powered inference and corroborate our theoretical findings.

preprint2022arXiv

An adaptive model checking test for functional linear model

Numerous studies have been devoted to the estimation and inference problems for functional linear models (FLM). However, few works focus on model checking problem that ensures the reliability of results. Limited tests in this area do not have tractable null distributions or asymptotic analysis under alternatives. Also, the functional predictor is usually assumed to be fully observed, which is impractical. To address these problems, we propose an adaptive model checking test for FLM. It combines regular moment-based and conditional moment-based tests, and achieves model adaptivity via the dimension of a residual-based subspace. The advantages of our test are manifold. First, it has a tractable chi-squared null distribution and higher powers under the alternatives than its components. Second, asymptotic properties under different underlying models are developed, including the unvisited local alternatives. Third, the test statistic is constructed upon finite grid points, which incorporates the discrete nature of collected data. We develop the desirable relationship between sample size and number of grid points to maintain the asymptotic properties. Besides, we provide a data-driven approach to estimate the dimension leading to model adaptivity, which is promising in sufficient dimension reduction. We conduct comprehensive numerical experiments to demonstrate the advantages the test inherits from its two simple components.

preprint2022arXiv

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

Although conceptualization has been widely studied in semantics and knowledge representation, it is still challenging to find the most accurate concept phrases to characterize the main idea of a text snippet on the fast-growing social media. This is partly attributed to the fact that most knowledge bases contain general terms of the world, such as trees and cars, which do not have the defining power or are not interesting enough to social media app users. Another reason is that the intricacy of natural language allows the use of tense, negation and grammar to change the logic or emphasis of language, thus conveying completely different meanings. In this paper, we present TAG, a high-quality concept matching dataset consisting of 10,000 labeled pairs of fine-grained concepts and web-styled natural language sentences, mined from the open-domain social media. The concepts we consider represent the trending interests of online users. Associated with TAG is a concept graph of these fine-grained concepts and entities to provide the structural context information. We evaluate a wide range of popular neural text matching models as well as pre-trained language models on TAG, and point out their insufficiency to tag social media content with the most appropriate concept. We further propose a novel graph-graph matching method that demonstrates superior abstraction and generalization performance by better utilizing both the structural context in the concept graph and logic interactions between semantic units in the sentence via syntactic dependency parsing. We open-source both the TAG dataset and the proposed methods to facilitate further research.

preprint2021arXiv

A reproducing kernel Hilbert space framework for functional data classification

We encounter a bottleneck when we try to borrow the strength of classical classifiers to classify functional data. The major issue is that functional data are intrinsically infinite dimensional, thus classical classifiers cannot be applied directly or have poor performance due to the curse of dimensionality. To address this concern, we propose to project functional data onto one specific direction, and then a distance-weighted discrimination DWD classifier is built upon the projection score. The projection direction is identified through minimizing an empirical risk function that contains the particular loss function in a DWD classifier, over a reproducing kernel Hilbert space. Hence our proposed classifier can avoid overfitting and enjoy appealing properties of DWD classifiers. This framework is further extended to accommodate functional data classification problems where scalar covariates are involved. In contrast to previous work, we establish a non-asymptotic estimation error bound on the relative misclassification rate. In finite sample case, we demonstrate that the proposed classifiers compare favorably with some commonly used functional classifiers in terms of prediction accuracy through simulation studies and a real-world application.

preprint2020arXiv

Advanced Algorithms for Penalized Quantile and Composite Quantile Regression

In this paper, we discuss a family of robust, high-dimensional regression models for quantile and composite quantile regression, both with and without an adaptive lasso penalty for variable selection. We reformulate these quantile regression problems and obtain estimators by applying the alternating direction method of multipliers (ADMM), majorize-minimization (MM), and coordinate descent (CD) algorithms. Our new approaches address the lack of publicly available methods for (composite) quantile regression, especially for high-dimensional data, both with and without regularization. Through simulation studies, we demonstrate the need for different algorithms applicable to a variety of data settings, which we implement in the cqrReg package for R. For comparison, we also introduce the widely used interior point (IP) formulation and test our methods against the IP algorithms in the existing quantreg package. Our simulation studies show that each of our methods, particularly MM and CD, excel in different settings such as with large or high-dimensional data sets, respectively, and outperform the methods currently implemented in quantreg. The ADMM approach offers specific promise for future developments in its amenability to parallelization and scalability.

preprint2016arXiv

Local Region Sparse Learning for Image-on-Scalar Regression

Identification of regions of interest (ROI) associated with certain disease has a great impact on public health. Imposing sparsity of pixel values and extracting active regions simultaneously greatly complicate the image analysis. We address these challenges by introducing a novel region-selection penalty in the framework of image-on-scalar regression. Our penalty combines the Smoothly Clipped Absolute Deviation (SCAD) regularization, enforcing sparsity, and the SCAD of total variation (TV) regularization, enforcing spatial contiguity, into one group, which segments contiguous spatial regions against zero-valued background. Efficient algorithm is based on the alternative direction method of multipliers (ADMM) which decomposes the non-convex problem into two iterative optimization problems with explicit solutions. Another virtue of the proposed method is that a divide and conquer learning algorithm is developed, thereby allowing scaling to large images. Several examples are presented and the experimental results are compared with other state-of-the-art approaches.

preprint2015arXiv

Estimation for bivariate quantile varying coefficient model

We propose a bivariate quantile regression method for the bivariate varying coefficient model through a directional approach. The varying coefficients are approximated by the B-spline basis and an $L_{2}$ type penalty is imposed to achieve desired smoothness. We develop a multistage estimation procedure based the Propagation-Separation~(PS) approach to borrow information from nearby directions. The PS method is capable of handling the computational complexity raised by simultaneously considering multiple directions to efficiently estimate varying coefficients while guaranteeing certain smoothness along directions. We reformulate the optimization problem and solve it by the Alternating Direction Method of Multipliers~(ADMM), which is implemented using R while the core is written in C to speed it up. Simulation studies are conducted to confirm the finite sample performance of our proposed method. A real data on Diffusion Tensor Imaging~(DTI) properties from a clinical study on neurodevelopment is analyzed.

preprint2015arXiv

Partial Functional Linear Quantile Regression for Neuroimaging Data Analysis

We propose a prediction procedure for the functional linear quantile regression model by using partial quantile covariance techniques and develop a simple partial quantile regression (SIMPQR) algorithm to efficiently extract partial quantile regression (PQR) basis for estimating functional coefficients. We further extend our partial quantile covariance techniques to functional composite quantile regression (CQR) defining partial composite quantile covariance. There are three major contributions. (1) We define partial quantile covariance between two scalar variables through linear quantile regression. We compute PQR basis by sequentially maximizing the partial quantile covariance between the response and projections of functional covariates. (2) In order to efficiently extract PQR basis, we develop a SIMPQR algorithm analogous to simple partial least squares (SIMPLS). (3) Under the homoscedasticity assumption, we extend our techniques to partial composite quantile covariance and use it to find the partial composite quantile regression (PCQR) basis. The SIMPQR algorithm is then modified to obtain the SIMPCQR algorithm. Two simulation studies show the superiority of our proposed methods. Two real data from ADHD-200 sample and ADNI are analyzed using our proposed methods.

preprint2015arXiv

Regularized quantile regression under heterogeneous sparsity with application to quantitative genetic traits

Genetic studies often involve quantitative traits. Identifying genetic features that influence quantitative traits can help to uncover the etiology of diseases. Quantile regression method considers the conditional quantiles of the response variable, and is able to characterize the underlying regression structure in a more comprehensive manner. On the other hand, genetic studies often involve high dimensional genomic features, and the underlying regression structure may be heterogeneous in terms of both effect sizes and sparsity. To account for the potential genetic heterogeneity, including the heterogeneous sparsity, a regularized quantile regression method is introduced. The theoretical property of the proposed method is investigated, and its performance is examined through a series of simulation studies. A real dataset is analyzed to demonstrate the application of the proposed method.

preprint2014arXiv

Model-Robust Designs for Quantile Regression

We give methods for the construction of designs for linear models, when the purpose of the investigation is the estimation of the conditional quantile function and the estimation method is quantile regression. The designs are robust against misspecified response functions, and against unanticipated heteroscedasticity. The methods are illustrated by example, and in a case study in which they are applied to growth charts.

preprint2013arXiv

Multivariate varying coefficient model for functional responses

Motivated by recent work studying massive imaging data in the neuroimaging literature, we propose multivariate varying coefficient models (MVCM) for modeling the relation between multiple functional responses and a set of covariates. We develop several statistical inference procedures for MVCM and systematically study their theoretical properties. We first establish the weak convergence of the local linear estimate of coefficient functions, as well as its asymptotic bias and variance, and then we derive asymptotic bias and mean integrated squared error of smoothed individual functions and their uniform convergence rate. We establish the uniform convergence rate of the estimated covariance function of the individual functions and its associated eigenvalue and eigenfunctions. We propose a global test for linear hypotheses of varying coefficient functions, and derive its asymptotic distribution under the null hypothesis. We also propose a simultaneous confidence band for each individual effect curve. We conduct Monte Carlo simulation to examine the finite-sample performance of the proposed procedures. We apply MVCM to investigate the development of white matter diffusivities along the genu tract of the corpus callosum in a clinical study of neurodevelopment.

preprint2013arXiv

Quantile tomography: using quantiles with multivariate data

The use of quantiles to obtain insights about multivariate data is addressed. It is argued that incisive insights can be obtained by considering directional quantiles, the quantiles of projections. Directional quantile envelopes are proposed as a way to condense this kind of information; it is demonstrated that they are essentially halfspace (Tukey) depth levels sets, coinciding for elliptic distributions (in particular multivariate normal) with density contours. Relevant questions concerning their indexing, the possibility of the reverse retrieval of directional quantile information, invariance with respect to affine transformations, and approximation/asymptotic properties are studied. It is argued that the analysis in terms of directional quantiles and their envelopes offers a straightforward probabilistic interpretation and thus conveys a concrete quantitative meaning; the directional definition can be adapted to elaborate frameworks, like estimation of extreme quantiles and directional quantile regression, the regression of depth contours on covariates. The latter facilitates the construction of multivariate growth charts---the question that motivated all the development.

preprint2013arXiv

Spatially Varying Coefficient Model for Neuroimaging Data with Jump Discontinuities

Motivated by recent work on studying massive imaging data in various neuroimaging studies, we propose a novel spatially varying coefficient model (SVCM) to spatially model the varying association between imaging measures in a three-dimensional (3D) volume (or 2D surface) with a set of covariates. Two key features of most neuorimaging data are the presence of multiple piecewise smooth regions with unknown edges and jumps and substantial spatial correlations. To specifically account for these two features, SVCM includes a measurement model with multiple varying coefficient functions, a jumping surface model for each varying coefficient function, and a functional principal component model. We develop a three-stage estimation procedure to simultaneously estimate the varying coefficient functions and the spatial correlations. The estimation procedure includes a fast multiscale adaptive estimation and testing procedure to independently estimate each varying coefficient function, while preserving its edges among different piecewise-smooth regions. We systematically investigate the asymptotic properties (e.g., consistency and asymptotic normality) of the multiscale adaptive parameter estimates. We also establish the uniform convergence rate of the estimated spatial covariance function and its associated eigenvalue and eigenfunctions. Our Monte Carlo simulation and real data analysis have confirmed the excellent performance of SVCM.

preprint2010arXiv

Discussion of "Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth"

Discussion of "Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth" by M. Hallin, D. Paindaveine and M. Siman [arXiv:1002.4486]

Linglong Kong

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Prediction-powered Inference by Mixture of Experts

An adaptive model checking test for functional linear model

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

A reproducing kernel Hilbert space framework for functional data classification

Advanced Algorithms for Penalized Quantile and Composite Quantile Regression

Local Region Sparse Learning for Image-on-Scalar Regression

Estimation for bivariate quantile varying coefficient model

Partial Functional Linear Quantile Regression for Neuroimaging Data Analysis

Regularized quantile regression under heterogeneous sparsity with application to quantitative genetic traits

Model-Robust Designs for Quantile Regression

Multivariate varying coefficient model for functional responses

Quantile tomography: using quantiles with multivariate data

Spatially Varying Coefficient Model for Neuroimaging Data with Jump Discontinuities

Discussion of "Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth"