Researcher profile

Debo Cheng

Debo Cheng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2023arXiv

Matching Using Sufficient Dimension Reduction for Heterogeneity Causal Effect Estimation

Causal inference plays an important role in under standing the underlying mechanisation of the data generation process across various domains. It is challenging to estimate the average causal effect and individual causal effects from observational data with high-dimensional covariates due to the curse of dimension and the problem of data sufficiency. The existing matching methods can not effectively estimate individual causal effect or solve the problem of dimension curse in causal inference. To address this challenge, in this work, we prove that the reduced set by sufficient dimension reduction (SDR) is a balance score for confounding adjustment. Under the theorem, we propose to use an SDR method to obtain a reduced representation set of the original covariates and then the reduced set is used for the matching method. In detail, a non-parametric model is used to learn such a reduced set and to avoid model specification errors. The experimental results on real-world datasets show that the proposed method outperforms the compared matching methods. Moreover, we conduct an experiment analysis and the results demonstrate that the reduced representation is enough to balance the imbalance between the treatment group and control group individuals.

preprint2022arXiv

Assessing Classifier Fairness with Collider Bias

The increasing application of machine learning techniques in everyday decision-making processes has brought concerns about the fairness of algorithmic decision-making. This paper concerns the problem of collider bias which produces spurious associations in fairness assessment and develops theorems to guide fairness assessment avoiding the collider bias. We consider a real-world application of auditing a trained classifier by an audit agency. We propose an unbiased assessment algorithm by utilising the developed theorems to reduce collider biases in the assessment. Experiments and simulations show the proposed algorithm reduces collider biases significantly in the assessment and is promising in auditing trained classifiers.

preprint2022arXiv

Discovering Ancestral Instrumental Variables for Causal Inference from Observational Data

Instrumental variable (IV) is a powerful approach to inferring the causal effect of a treatment on an outcome of interest from observational data even when there exist latent confounders between the treatment and the outcome. However, existing IV methods require that an IV is selected and justified with domain knowledge. An invalid IV may lead to biased estimates. Hence, discovering a valid IV is critical to the applications of IV methods. In this paper, we study and design a data-driven algorithm to discover valid IVs from data under mild assumptions. We develop the theory based on partial ancestral graphs (PAGs) to support the search for a set of candidate Ancestral IVs (AIVs), and for each possible AIV, the identification of its conditioning set. Based on the theory, we propose a data-driven algorithm to discover a pair of IVs from data. The experiments on synthetic and real-world datasets show that the developed IV discovery algorithm estimates accurate estimates of causal effects in comparison with the state-of-the-art IV based causal effect estimators.

preprint2022arXiv

Local search for efficient causal effect estimation

Causal effect estimation from observational data is a challenging problem, especially with high dimensional data and in the presence of unobserved variables. The available data-driven methods for tackling the problem either provide an estimation of the bounds of a causal effect (i.e. nonunique estimation) or have low efficiency. The major hurdle for achieving high efficiency while trying to obtain unique and unbiased causal effect estimation is how to find a proper adjustment set for confounding control in a fast way, given the huge covariate space and considering unobserved variables. In this paper, we approach the problem as a local search task for finding valid adjustment sets in data. We establish the theorems to support the local search for adjustment sets, and we show that unique and unbiased estimation can be achieved from observational data even when there exist unobserved variables. We then propose a data-driven algorithm that is fast and consistent under mild assumptions. We also make use of a frequent pattern mining method to further speed up the search of minimal adjustment sets for causal effect estimation. Experiments conducted on extensive synthetic and real-world datasets demonstrate that the proposed algorithm outperforms the state-of-the-art criteria/estimators in both accuracy and time-efficiency.

preprint2020arXiv

Sufficient Dimension Reduction for Average Causal Effect Estimation

Having a large number of covariates can have a negative impact on the quality of causal effect estimation since confounding adjustment becomes unreliable when the number of covariates is large relative to the samples available. Propensity score is a common way to deal with a large covariate set, but the accuracy of propensity score estimation (normally done by logistic regression) is also challenged by large number of covariates. In this paper, we prove that a large covariate set can be reduced to a lower dimensional representation which captures the complete information for adjustment in causal effect estimation. The theoretical result enables effective data-driven algorithms for causal effect estimation. We develop an algorithm which employs a supervised kernel dimension reduction method to search for a lower dimensional representation for the original covariates, and then utilizes nearest neighbor matching in the reduced covariate space to impute the counterfactual outcomes to avoid large-sized covariate set problem. The proposed algorithm is evaluated on two semi-synthetic and three real-world datasets and the results have demonstrated the effectiveness of the algorithm.