Source author record

Lu Lin

Lu Lin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Artificial Intelligence Machine Learning math.ST Statistics Theory Distributed, Parallel, and Cluster Computing math.OC

Catalog footprint

What is connected

14works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FROG: Fair Removal on Graphs

With growing emphasis on privacy regulations, machine unlearning has become increasingly critical in real-world applications such as social networks and recommender systems, many of which are naturally represented as graphs. However, existing graph unlearning methods often modify nodes or edges indiscriminately, overlooking their impact on fairness. For instance, forgetting links between users of different genders may inadvertently exacerbate group disparities. To address this issue, we propose a novel framework that jointly optimizes both the graph structure and the model to achieve fair unlearning. Our method rewires the graph by removing redundant edges that hinder forgetting while preserving fairness through targeted edge augmentation. We further introduce a worst-case evaluation mechanism to assess robustness under challenging scenarios. Experiments on real-world datasets show that our approach achieves more effective and fair unlearning than existing baselines.

preprint2022arXiv

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory.

preprint2022arXiv

Feature screening for multi-response linear models by empirical likelihood

This paper proposes a new feature screening method for the multi-response ultrahigh dimensional linear model by empirical likelihood. Through a multivariate moment condition, the empirical likelihood induced ranking statistics can exploit the joint effect among responses, and thus result in a much better performance than the methods considering responses individually. More importantly, by the use of empirical likelihood, the new method adapts to the heterogeneity in the conditional variance of random error. The sure screening property of the newly proposed method is proved with the model size controlled within a reasonable scale. Additionally, the new screening method is also extended to a conditional version so that it can recover the hidden predictors which are easily missed by the unconditional method. The corresponding theoretical properties are also provided. Finally, both numerical studies and real data analysis are provided to illustrate the effectiveness of the proposed methods.

preprint2022arXiv

Global Bias-Corrected Divide-and-Conquer by Quantile-Matched Composite for General Nonparametric Regressions

The issues of bias-correction and robustness are crucial in the strategy of divide-and-conquer (DC), especially for asymmetric nonparametric models with massive data. It is known that quantile-based methods can achieve the robustness, but the quantile estimation for nonparametric regression has non-ignorable bias when the error distribution is asymmetric. This paper explores a global bias-corrected DC by quantile-matched composite for nonparametric regressions with general error distributions. The proposed strategies can achieve the bias-correction and robustness, simultaneously. Unlike common DC quantile estimations that use an identical quantile level to construct a local estimator by each local machine, in the new methodologies, the local estimators are obtained at various quantile levels for different data batches, and then the global estimator is elaborately constructed as a weighted sum of the local estimators. In the weighted sum, the weights and quantile levels are well-matched such that the bias of the global estimator is corrected significantly, especially for the case where the error distribution is asymmetric. Based on the asymptotic properties of the global estimator, the optimal weights are attained, and the corresponding algorithms are then suggested. The behaviors of the new methods are further illustrated by various numerical examples from simulation experiments and real data analyses. Compared with the competitors, the new methods have the favorable features of estimation accuracy, robustness, applicability and computational efficiency.

preprint2022arXiv

Unbiased Graph Embedding with Biased Graph Observations

Graph embedding techniques are pivotal in real-world machine learning tasks that operate on graph-structured data, such as social recommendation and protein structure modeling. Embeddings are mostly performed on the node level for learning representations of each node. Since the formation of a graph is inevitably affected by certain sensitive node attributes, the node embeddings can inherit such sensitive information and introduce undesirable biases in downstream tasks. Most existing works impose ad-hoc constraints on the node embeddings to restrict their distributions for unbiasedness/fairness, which however compromise the utility of the resulting embeddings. In this paper, we propose a principled new way for unbiased graph embedding by learning node embeddings from an underlying bias-free graph, which is not influenced by sensitive node attributes. Motivated by this new perspective, we propose two complementary methods for uncovering such an underlying graph, with the goal of introducing minimum impact on the utility of the embeddings. Both our theoretical justification and extensive experimental comparisons against state-of-the-art solutions demonstrate the effectiveness of our proposed methods.

preprint2021arXiv

A General Framework of Online Updating Variable Selection for Generalized Linear Models with Streaming Datasets

In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and parameter estimation in generalized linear models with streaming datasets. This is a type of online updating penalty likelihoods with differentiable or non-differentiable penalty function. The online updating coordinate descent algorithm is proposed to solve the online updating optimization problem. Moreover, a tuning parameter selection is suggested in an online updating way. The selection and estimation consistencies, and the oracle property are established, theoretically. Our methods are further examined and illustrated by various numerical examples from both simulation experiments and a real data analysis.

preprint2020arXiv

Optimally estimating the sample standard deviation from the five-number summary

When reporting the results of clinical studies, some researchers may choose the five-number summary (including the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation, particularly for skewed data. For these studies, when included in a meta-analysis, it is often desired to convert the five-number summary back to the sample mean and standard deviation. For this purpose, several methods have been proposed in the recent literature and they are increasingly used nowadays. In this paper, we propose to further advance the literature by developing a smoothly weighted estimator for the sample standard deviation that fully utilizes the sample size information. For ease of implementation, we also derive an approximation formula for the optimal weight, as well as a shortcut formula for the sample standard deviation. Numerical results show that our new estimator provides a more accurate estimate for normal data and also performs favorably for non-normal data. Together with the optimal sample mean estimator in Luo et al., our new methods have dramatically improved the existing methods for data transformation, and they are capable to serve as "rules of thumb" in meta-analysis for studies reported with the five-number summary. Finally for practical use, an Excel spreadsheet and an online calculator are also provided for implementing our optimal estimators.

preprint2020arXiv

Unified Rules of Renewable Weighted Sums for Various Online Updating Estimations

This paper establishes unified frameworks of renewable weighted sums (RWS) for various online updating estimations in the models with streaming data sets. The newly defined RWS lays the foundation of online updating likelihood, online updating loss function, online updating estimating equation and so on. The idea of RWS is intuitive and heuristic, and the algorithm is computationally simple. This paper chooses nonparametric model as an exemplary setting. The RWS applies to various types of nonparametric estimators, which include but are not limited to nonparametric likelihood, quasi-likelihood and least squares. Furthermore, the method and the theory can be extended into the models with both parameter and nonparametric function. The estimation consistency and asymptotic normality of the proposed renewable estimator are established, and the oracle property is obtained. Moreover, these properties are always satisfied, without any constraint on the number of data batches, which means that the new method is adaptive to the situation where streaming data sets arrive perpetually. The behavior of the method is further illustrated by various numerical examples from simulation experiments and real data analysis.

preprint2014arXiv

Inference for biased models: a quasi-instrumental variable approach

For linear regression models who are not exactly sparse in the sense that the coefficients of the insignificant variables are not exactly zero, the working models obtained by a variable selection are often biased. Even in sparse cases, after a variable selection, when some significant variables are missing, the working models are biased as well. Thus, under such situations, root-n consistent estimation and accurate prediction could not be expected. In this paper, a novel remodelling method is proposed to produce an unbiased model when quasi-instrumental variables are introduced. The root-n estimation consistency and the asymptotic normality can be achieved, and the prediction accuracy can be promoted as well. The performance of the new method is examined through simulation studies.

preprint2014arXiv

Upper expectation parametric regression

Every observation may follow a distribution that is randomly selected in a class of distributions. It is called the distribution uncertainty. This is a fact acknowledged in some research fields such as financial risk measure. Thus, the classical expectation is not identifiable in general.In this paper, a distribution uncertainty is defined, and then an upper expectation regression is proposed, which can describe the relationship between extreme events and relevant covariates under the framework of distribution uncertainty. As there are no classical methods available to estimate the parameters in the upper expectation regression, a two-step penalized maximum least squares procedure is proposed to estimate the mean function and the upper expectation of the error. The resulting estimators are consistent and asymptotically normal in a certain sense.Simulation studies and a real data example are conducted to show that the classical least squares estimation does not work and the penalized maximum least squares performs well.

preprint2013arXiv

Asymptotic Composite Estimation

Composition methodologies in the current literature are mainly to promote estimation efficiency via direct composition, either, of initial estimators or of objective functions. In this paper, composite estimation is investigated for both estimation efficiency and bias reduction. To this end, a novel method is proposed by utilizing a regression relationship between initial estimators and values of model-independent parameter in an asymptotic sense. The resulting estimators could have smaller limiting variances than those of initial estimators, and for nonparametric regression estimation, could also have faster convergence rate than the classical optimal rate that the corresponding initial estimators can achieve. The simulations are carried out to examine its performance in finite sample situations.

preprint2013arXiv

Sublinear expectation linear regression

Nonlinear expectation, including sublinear expectation as its special case, is a new and original framework of probability theory and has potential applications in some scientific fields, especially in finance risk measure and management. Under the nonlinear expectation framework, however, the related statistical models and statistical inferences have not yet been well established. The goal of this paper is to construct the sublinear expectation regression and investigate its statistical inference. First, a sublinear expectation linear regression is defined and its identifiability is given. Then, based on the representation theorem of sublinear expectation and the newly defined model, several parameter estimations and model predictions are suggested, the asymptotic normality of estimations and the mini-max property of predictions are obtained. Furthermore, new methods are developed to realize variable selection for high-dimensional model. Finally, simulation studies and a real-life example are carried out to illustrate the new models and methodologies. All notions and methodologies developed are essentially different from classical ones and can be thought of as a foundation for general nonlinear expectation statistics.

preprint2011arXiv

Estimation and inference for high-dimensional non-sparse models

To successfully work on variable selection, sparse model structure has become a basic assumption for all existing methods. However, this assumption is questionable as it is hard to hold in most of cases and none of existing methods may provide consistent estimation and accurate model prediction in nons-parse scenarios. In this paper, we propose semiparametric re-modeling and inference when the linear regression model under study is possibly non-sparse. After an initial working model is selected by a method such as the Dantzig selector adopted in this paper, we re-construct a globally unbiased semiparametric model by use of suitable instrumental variables and nonparametric adjustment. The newly defined model is identifiable, and the estimator of parameter vector is asymptotically normal. The consistency, together with the re-built model, promotes model prediction. This method naturally works when the model is indeed sparse and thus is of robustness against non-sparseness in certain sense. Simulation studies show that the new approach has, particularly when $p$ is much larger than $n$, significant improvement of estimation and prediction accuracies over the Gaussian Dantzig selector and other classical methods. Even when the model under study is sparse, our method is also comparable to the existing methods designed for sparse models.

preprint2010arXiv

Adaptive post-Dantzig estimation and prediction for non-sparse "large $p$ and small $n$" models

For consistency (even oracle properties) of estimation and model prediction, almost all existing methods of variable/feature selection critically depend on sparsity of models. However, for ``large $p$ and small $n$" models sparsity assumption is hard to check and particularly, when this assumption is violated, the consistency of all existing estimations is usually impossible because working models selected by existing methods such as the LASSO and the Dantzig selector are usually biased. To attack this problem, we in this paper propose adaptive post-Dantzig estimation and model prediction. Here the adaptability means that the consistency based on the newly proposed method is adaptive to non-sparsity of model, choice of shrinkage tuning parameter and dimension of predictor vector. The idea is that after a sub-model as a working model is determined by the Dantzig selector, we construct a globally unbiased sub-model by choosing suitable instrumental variables and nonparametric adjustment. The new estimation of the parameters in the sub-model can be of the asymptotic normality. The consistent estimator, together with the selected sub-model and adjusted model, improves model predictions. Simulation studies show that the new approach has the significant improvement of estimation and prediction accuracies over the Gaussian Dantzig selector and other classical methods have.

Lu Lin

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

FROG: Fair Removal on Graphs

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Feature screening for multi-response linear models by empirical likelihood

Global Bias-Corrected Divide-and-Conquer by Quantile-Matched Composite for General Nonparametric Regressions

Unbiased Graph Embedding with Biased Graph Observations

A General Framework of Online Updating Variable Selection for Generalized Linear Models with Streaming Datasets

Optimally estimating the sample standard deviation from the five-number summary

Unified Rules of Renewable Weighted Sums for Various Online Updating Estimations

Inference for biased models: a quasi-instrumental variable approach

Upper expectation parametric regression

Asymptotic Composite Estimation

Sublinear expectation linear regression

Estimation and inference for high-dimensional non-sparse models

Adaptive post-Dantzig estimation and prediction for non-sparse "large $p$ and small $n$" models