Researcher profile

Fan Li

Fan Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
35works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

35 published item(s)

preprint2026arXiv

YOSE: You Only Select Essential Tokens for Efficient DiT-based Video Object Removal

Recent advances in Diffusion Transformer (DiT)-based video generation technologies have shown impressive results for video object removal. However, these methods still suffer from substantial inference latency. For instance, although MiniMax Remover achieves state-of-the-art visual quality, it operates at only around 10FPS, primarily due to dense computations over the entire spatiotemporal token space, even when only a small masked region actually requires processing. In this paper, we present YOSE, You Only Select Essential Tokens, an efficient fine-tuning framework. YOSE introduces two key components: Batch Variable-length Indexing (BVI) and Diffusion Process Simulator (DiffSim) Module. BVI is a differentiable dynamic indexing operator that adaptively selects essential tokens based on mask information, enabling variable-length token processing across samples. DiffSim provides a diffusion process approximation mechanism for unmasked tokens, which simulates the influence of unmasked regions within DiT self-attention to maintain semantic consistency for masked tokens. With these designs, YOSE achieves mask-aware acceleration, where the inference time scales approximately linearly with the masked regions, in contrast to full-token diffusion methods whose computation remains constant regardless of the mask size. Extensive experiments demonstrate that YOSE achieves up to 2.5X speedup in 70% of cases while maintaining visual quality comparable to the baseline. Code is available at: https://github.com/Wucy0519/YOSE-CVPR26.

preprint2023arXiv

Principal Stratification with Time-to-Event Outcomes

Post-randomization events, also known as intercurrent events, such as treatment noncompliance and censoring due to a terminal event, are common in clinical trials. Principal stratification is a framework for causal inference in the presence of intercurrent events. Despite the extensive existing literature, there lacks generally applicable and accessible methods for principal stratification analysis with time-to-event outcomes. In this paper, we specify two causal estimands for time-to-event outcomes in principal stratification. For estimation, we adopt the general strategy of latent mixture modeling and derive the corresponding likelihood function. For computational convenience, we illustrate the general strategy with a mixture of Bayesian parametric Weibull-Cox proportional model for the outcome. We utilize the Stan programming language to obtain automatic posterior sampling of the model parameters via the Hamiltonian Monte Carlo. We provide the analytical forms of the causal estimands as functions of the model parameters and an alternative numerical method when analytical forms are not available. We apply the proposed method to the ADAPTABLE trial to evaluate the causal effect of taking 81 mg versus 325 mg aspirin on the risk of major adverse cardiovascular events.

preprint2022arXiv

%CRTFASTGEEPWR: a SAS macro for power of the generalized estimating equations of multi-period cluster randomized trials with application to stepped wedge designs

Multi-period cluster randomized trials (CRTs) are increasingly used for the evaluation of interventions delivered at the group level. While generalized estimating equations (GEE) are commonly used to provide population-averaged inference in CRTs, there is a gap of general methods and statistical software tools for power calculation based on multi-parameter, within-cluster correlation structures suitable for multi-period CRTs that can accommodate both complete and incomplete designs. A computationally fast, non-simulation procedure for determining statistical power is described for the GEE analysis of complete and incomplete multi-period cluster randomized trials. The procedure is implemented via a SAS macro, \%CRTFASTGEEPWR, which is applicable to binary, count and continuous responses and several correlation structures in multi-period CRTs. The SAS macro is illustrated in the power calculation of two complete and two incomplete stepped wedge cluster randomized trial scenarios under different specifications of marginal mean model and within-cluster correlation structure. The proposed GEE power method is quite general as demonstrated in the SAS macro with numerous input options. The power procedure and macro can also be used in the planning of parallel and crossover CRTs in addition to cross-sectional and closed cohort stepped wedge trials.

preprint2022arXiv

A Causal Mediation Model for Longitudinal Mediators and Survival Outcomes with an Application to Animal Behavior

In animal behavior studies, a common goal is to investigate the causal pathways between an exposure and outcome, and a mediator that lies in between. Causal mediation analysis provides a principled approach for such studies. Although many applications involve longitudinal data, the existing causal mediation models are not directly applicable to settings where the mediators are measured on irregular time grids. In this paper, we propose a causal mediation model that accommodates longitudinal mediators on arbitrary time grids and survival outcomes simultaneously. We take a functional data analysis perspective and view longitudinal mediators as realizations of underlying smooth stochastic processes. We define causal estimands of direct and indirect effects accordingly and provide corresponding identification assumptions. We employ a functional principal component analysis approach to estimate the mediator process, and propose a Cox hazard model for the survival outcome that flexibly adjusts the mediator process. We then derive a g-computation formula to express the causal estimands using the model coefficients. The proposed method is applied to a longitudinal data set from the Amboseli Baboon Research Project to investigate the causal relationships between early adversity, adult physiological stress responses, and survival among wild female baboons. We find that adversity experienced in early life has a significant direct effect on females' life expectancy and survival probability, but find little evidence that these effects were mediated by markers of the stress response in adulthood. We further developed a sensitivity analysis method to assess the impact of potential violation to the key assumption of sequential ignorability.

preprint2022arXiv

Addressing Extreme Propensity Scores in Estimating Counterfactual Survival Functions via the Overlap Weights

The inverse probability weighting approach is popular for evaluating treatment effects in observational studies, but extreme propensity scores could bias the estimator and induce excessive variance. Recently, the overlap weighting approach has been proposed to alleviate this problem, which smoothly down-weighs the subjects with extreme propensity scores. Although advantages of overlap weighting have been extensively demonstrated in literature with continuous and binary outcomes, research on its performance with time-to-event or survival outcomes is limited. In this article, we propose two weighting estimators that combine propensity score weighting and inverse probability of censoring weighting to estimate the counterfactual survival functions. These estimators are applicable to the general class of balancing weights, which includes inverse probability weighting, trimming, and overlap weighting as special cases. We conduct simulations to examine the empirical performance of these estimators with different weighting schemes in terms of bias, variance, and 95% confidence interval coverage, under various degree of covariate overlap between treatment groups and censoring rate. We demonstrate that overlap weighting consistently outperforms inverse probability weighting and associated trimming methods in bias, variance, and coverage for time-to-event outcomes, and the advantages increase as the degree of covariate overlap between the treatment groups decreases.

preprint2022arXiv

Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison

Multiple imputation (MI) is a popular approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is one of the most widely used MI algorithms for multivariate data, but it lacks theoretical foundation and is computationally intensive. Recently, missing data imputation methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on evaluating their performance in realistic settings compared to MICE, particularly in big surveys. We conduct extensive simulation studies based on a subsample of the American Community Survey to compare the repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation networks, and multiple imputation using denoising autoencoders. We find the deep learning imputation methods are superior to MICE in terms of computational time. However, with the default choice of hyperparameters in the common software packages, MICE with classification trees consistently outperforms, often by a large margin, the deep learning imputation methods in terms of bias, mean squared error, and coverage under a range of realistic settings.

preprint2022arXiv

Causal Inference with Spatio-temporal Data: Estimating the Effects of Airstrikes on Insurgent Violence in Iraq

Many causal processes have spatial and temporal dimensions. Yet the classic causal inference framework is not directly applicable when the treatment and outcome variables are generated by spatio-temporal point processes. We extend the potential outcomes framework to these settings by formulating the treatment point process as a stochastic intervention. Our causal estimands include the expected number of outcome events in a specified area under a particular stochastic treatment assignment strategy. Our methodology allows for arbitrary patterns of spatial spillover and temporal carryover effects. Using martingale theory, we show that the proposed estimator is consistent and asymptotically normal as the number of time periods increases. We propose a sensitivity analysis for the possible existence of unmeasured confounders, and extend it to the Hajek estimator. Simulation studies are conducted to examine the estimators' finite sample performance. Finally, we illustrate the proposed methods by estimating the effects of American airstrikes on insurgent violence in Iraq from February 2007 to July 2008. Our analysis suggests that increasing the average number of daily airstrikes for up to one month may result in more insurgent attacks. We also find some evidence that airstrikes can displace attacks from Baghdad to new locations up to 400 kilometers away

preprint2022arXiv

Constrained randomization and statistical inference for multi-arm parallel cluster randomized controlled trials

A practical limitation of cluster randomized controlled trials (cRCTs) is that the number of available clusters may be small, resulting in an increased risk of baseline imbalance under simple randomization. Constrained randomization overcomes this issue by restricting the allocation to a subset of randomization schemes where sufficient overall covariate balance across comparison arms is achieved. However, for multi-arm cRCTs, several design and analysis issues pertaining to constrained randomization have not been fully investigated. Motivated by an ongoing multi-arm cRCT, we elaborate the method of constrained randomization and provide a comprehensive evaluation of the statistical properties of model-based and randomization-based tests under both simple and constrained randomization designs in multi-arm cRCTs, with varying combinations of design and analysis-based covariate adjustment strategies. In particular, as randomization-based tests have not been extensively studied in multi-arm cRCTs, we additionally develop most-powerful randomization tests under the linear mixed model framework for our comparisons. Our results indicate that under constrained randomization, both model-based and randomization-based analyses could gain power while preserving nominal type I error rate, given proper analysis-based adjustment for the baseline covariates. Randomization-based analyses, however, are more robust against violations of distributional assumptions. The choice of balance metrics and candidate set sizes and their implications on the testing of the pairwise and global hypotheses are also discussed. Finally, we caution against the design and analysis of multi-arm cRCTs with an extremely small number of clusters, due to insufficient degrees of freedom and the tendency to obtain an overly restricted randomization space.

preprint2022arXiv

Effects of isoscalar- and isovector-scalar meson mixing on neutron star structure

Based on the accurately calibrated interaction FSUGold, we show that including isovector scalar $δ$ meson and its coupling to isoscalar scalar $σ$ meson in the relativistic mean field (RMF) model can soften the symmetry energy $E_{\rm{sym}}(n)$ at intermediate densities while stiffen the $E_{\rm{sym}}(n)$ at high densities. We find this new RMF model can be simultaneously compatible with (1) the constraints on the equation of state of symmetric nuclear matter at suprasaturation densities from flow data in heavy-ion collisions, (2) the neutron skin thickness of $^{208}$Pb from the PREX-II experiment, (3) the largest mass of neutron star (NS) reported so far from PSR J0740+6620, (4) the limit of $Λ_{1.4}\leq580$ for the dimensionless tidal deformability of the canonical 1.4$M_{\odot}$ NS from the gravitational wave signal GW170817, (5) the mass-radius relation of PSR J0030+0451 and PSR J0740+6620 measured by NICER, and thus remove the tension between PREX-II and GW170817 observed in the conventional RMF model.

preprint2022arXiv

Envelope imbalanced ensemble model with deep sample learning and local-global structure consistency

The class imbalance problem is important and challenging. Ensemble approaches are widely used to tackle this problem because of their effectiveness. However, existing ensemble methods are always applied into original samples, while not considering the structure information among original samples. The limitation will prevent the imbalanced learning from being better. Besides, research shows that the structure information among samples includes local and global structure information. Based on the analysis above, an imbalanced ensemble algorithm with the deep sample pre-envelope network (DSEN) and local-global structure consistency mechanism (LGSCM) is proposed here to solve the problem.This algorithm can guarantee high-quality deep envelope samples for considering the local manifold and global structures information, which is helpful for imbalance learning. First, the deep sample envelope pre-network (DSEN) is designed to mine structure information among samples.Then, the local manifold structure metric (LMSM) and global structure distribution metric (GSDM) are designed to construct LGSCM to enhance distribution consistency of interlayer samples. Next, the DSEN and LGSCM are put together to form the final deep sample envelope network (DSEN-LG). After that, base classifiers are applied on the layers of deep samples respectively.Finally, the predictive results from base classifiers are fused through bagging ensemble learning mechanism. To demonstrate the effectiveness of the proposed method, forty-four public datasets and more than ten representative relevant algorithms are chosen for verification. The experimental results show that the algorithm is significantly better than other imbalanced ensemble algorithms.

preprint2022arXiv

Generalizing trial evidence to target populations in non-nested designs: Applications to AIDS clinical trials

Comparative effectiveness evidence from randomized trials may not be directly generalizable to a target population of substantive interest when, as in most cases, trial participants are not randomly sampled from the target population. Motivated by the need to generalize evidence from two trials conducted in the AIDS Clinical Trials Group (ACTG), we consider weighting, regression and doubly robust estimators to estimate the causal effects of HIV interventions in a specified population of people living with HIV in the USA. We focus on a non-nested trial design and discuss strategies for both point and variance estimation of the target population average treatment effect. Specifically in the generalizability context, we demonstrate both analytically and empirically that estimating the known propensity score in trials does not increase the variance for each of the weighting, regression and doubly robust estimators. We apply these methods to generalize the average treatment effects from two ACTG trials to specified target populations and operationalize key practical considerations. Finally, we report on a simulation study that investigates the finite-sample operating characteristics of the generalizability estimators and their sandwich variance estimators.

preprint2022arXiv

Improving sandwich variance estimation for marginal Cox analysis of cluster randomized trials

Cluster randomized trials (CRTs) frequently recruit a small number of clusters, therefore necessitating the application of small-sample corrections for valid inference. A recent systematic review indicated that CRTs reporting right-censored, time-to-event outcomes are not uncommon, and that the marginal Cox proportional hazards model is one of the common approaches used for primary analysis. While small-sample corrections have been studied under marginal models with continuous, binary and count outcomes, no prior research has been devoted to the development and evaluation of bias-corrected sandwich variance estimators when clustered time-to-event outcomes are analyzed by the marginal Cox model. To improve current practice, we propose 9 bias-corrected sandwich variance estimators for the analysis of CRTs using the marginal Cox model, and report on a simulation study to evaluate their small-sample properties. Our results indicate that the optimal choice of bias-corrected sandwich variance estimator for CRTs with survival outcomes can depend on the variability of cluster sizes, and can also slightly differ whether it is evaluated according to relative bias or type I error rate. Finally, we illustrate the new variance estimators in a real-world CRT where the conclusion about intervention effectiveness differs depending on the use of small-sample bias corrections. The proposed sandwich variance estimators are implemented in an R package CoxBcv.

preprint2022arXiv

Novel Distributed Algorithms Design for Nonsmooth Resource Allocation on Weight-Balanced Digraphs

In this paper, the distributed resource allocation problem on strongly connected and weight-balanced digraphs is investigated, where the decisions of each agent are restricted to satisfy the coupled network resource constraints and heterogeneous general convex sets. Moreover, the local cost function can be non-smooth. In order to achieve the exact optimum of the nonsmooth resource allocation problem, a novel continuous-time distributed algorithm based on the gradient descent scheme and differentiated projection operators is proposed. With the help of the set-valued LaSalle invariance principle and nonsmooth analysis, it is demonstrated that the algorithm converges asymptotically to the global optimal allocation. Moreover, for the situation where local constraints are not involved and the cost functions are differentiable with Lipschitz gradients, the convergence of the algorithm to the exact optimal solution is exponentially fast. Finally, the effectiveness of the proposed algorithms is illustrated by simulation examples.

preprint2022arXiv

Power analysis for cluster randomized trials with continuous co-primary endpoints

Pragmatic trials evaluating health care interventions often adopt cluster randomization due to scientific or logistical considerations. Previous reviews have shown that co-primary endpoints are common in pragmatic trials but infrequently recognized in sample size or power calculations. While methods for power analysis based on $K$ ($K\geq 2$) binary co-primary endpoints are available for CRTs, to our knowledge, methods for continuous co-primary endpoints are not yet available. Assuming a multivariate linear mixed model that accounts for multiple types of intraclass correlation coefficients (endpoint-specific ICCs, intra-subject ICCs and inter-subject between-endpoint ICCs) among the observations in each cluster, we derive the closed-form joint distribution of $K$ treatment effect estimators to facilitate sample size and power determination with different types of null hypotheses under equal cluster sizes. We characterize the relationship between the power of each test and different types of correlation parameters. We further relax the equal cluster size assumption and approximate the joint distribution of the $K$ treatment effect estimators through the mean and coefficient of variation of cluster sizes. Our simulation studies with a finite number of clusters indicate that the predicted power by our method agrees well with the empirical power, when the parameters in the multivariate linear mixed model are estimated via the expectation-maximization algorithm. An application to a real CRT is presented to illustrate the proposed method.

preprint2022arXiv

Power considerations for generalized estimating equations analyses of four-level cluster randomized trials

In this article, we develop methods for sample size and power calculations in four-level intervention studies when intervention assignment is carried out at any level, with a particular focus on cluster randomized trials (CRTs). CRTs involving four levels are becoming popular in health care research, where the effects are measured, for example, from evaluations (level 1) within participants (level 2) in divisions (level 3) that are nested in clusters (level 4). In such multi-level CRTs, we consider three types of intraclass correlations between different evaluations to account for such clustering: that of the same participant, that of different participants from the same division, and that of different participants from different divisions in the same cluster. Assuming arbitrary link and variance functions, with the proposed correlation structure as the true correlation structure, closed-form sample size formulas for randomization carried out at any level (including individually randomized trials within a four-level clustered structure) are derived based on the generalized estimating equations approach using the model-based variance and using the sandwich variance with an independence working correlation matrix. We demonstrate that empirical power corresponds well with that predicted by the proposed method for as few as 8 clusters, when data are analyzed using the matrix-adjusted estimating equations for the correlation parameters with a bias-corrected sandwich variance estimator, under both balanced and unbalanced designs.

preprint2022arXiv

Solving Nonsmooth Resource Allocation Problems with Feasibility Constraints through Novel Distributed Algorithms

The distributed non-smooth resource allocation problem over multi-agent networks is studied in this paper, where each agent is subject to globally coupled network resource constraints and local feasibility constraints described in terms of general convex sets. To solve such a problem, two classes of novel distributed continuous-time algorithms via differential inclusions and projection operators are proposed. Moreover, the convergence of the algorithms is analyzed by the Lyapunov functional theory and nonsmooth analysis. We illustrate that the first algorithm can globally converge to the exact optimum of the problem when the interaction digraph is weight-balanced and the local cost functions being strongly convex. Furthermore, the fully distributed implementation of the algorithm is studied over connected undirected graphs with strictly convex local cost functions. In addition, to improve the drawback of the first algorithm that requires initialization, we design the second algorithm which can be implemented without initialization to achieve global convergence to the optimal solution over connected undirected graphs with strongly convex cost functions. Finally, several numerical simulations verify the results.

preprint2022arXiv

Using propensity scores for racial disparities analysis

Propensity score plays a central role in causal inference, but its use is not limited to causal comparisons. As a covariate balancing tool, propensity score can be used for controlled descriptive comparisons between groups whose memberships are not manipulable. A prominent example is racial disparities in health care. However, conceptual confusion and hesitation persists for using propensity score in racial disparities studies. In this commentary, we argue that propensity score, possibly combined with other methods, is an effective tool for racial disparities analysis. We describe relevant estimands, target population, and assumptions. In particular, we clarify that a controlled descriptive comparisons require weaker assumptions than a causal comparison. We discuss three common propensity score weighting strategies: overlap weighting, inverse probability weighting and average treatment effect for treated weighting. We further describe how to combine weighting with the rank-and-replace adjustment method to produce racial disparity estimates concordant to the Institute of Medicine's definition. The method is illustrated by a re-analysis of the Medical Expenditure Panel Survey data.

preprint2021arXiv

Causal Mediation Analysis for Sparse and Irregular Longitudinal Data

Causal mediation analysis seeks to investigate how the treatment effect of an exposure on outcomes is mediated through intermediate variables. Although many applications involve longitudinal data, the existing methods are not directly applicable to settings where the mediator and outcome are measured on sparse and irregular time grids. We extend the existing causal mediation framework from a functional data analysis perspective, viewing the sparse and irregular longitudinal data as realizations of underlying smooth stochastic processes. We define causal estimands of direct and indirect effects accordingly and provide corresponding identification assumptions. For estimation and inference, we employ a functional principal component analysis approach for dimension reduction and use the first few functional principal components instead of the whole trajectories in the structural equation models. We adopt the Bayesian paradigm to accurately quantify the uncertainties. The operating characteristics of the proposed methods are examined via simulations. We apply the proposed methods to a longitudinal data set from a wild baboon population in Kenya to investigate the causal relationships between early adversity, strength of social bonds between animals, and adult glucocorticoid hormone concentrations. We find that early adversity has a significant direct effect (a 9-14% increase) on females' glucocorticoid concentrations across adulthood, but find little evidence that these effects were mediated by weak social bonds.

preprint2021arXiv

Counterfactual Representation Learning with Balancing Weights

A key to causal inference with observational data is achieving balance in predictive features associated with each treatment type. Recent literature has explored representation learning to achieve this goal. In this work, we discuss the pitfalls of these strategies - such as a steep trade-off between achieving balance and predictive power - and present a remedy via the integration of balancing weights in causal learning. Specifically, we theoretically link balance to the quality of propensity estimation, emphasize the importance of identifying a proper target population, and elaborate on the complementary roles of feature balancing and weight adjustments. Using these concepts, we then develop an algorithm for flexible, scalable and accurate estimation of causal effects. Finally, we show how the learned weighted representations may serve to facilitate alternative causal learning procedures with appealing statistical features. We conduct an extensive set of experiments on both synthetic examples and standard benchmarks, and report encouraging results relative to state-of-the-art baselines.

preprint2021arXiv

Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure

A stepped wedge cluster randomized trial is a type of longitudinal cluster design that sequentially switches clusters to intervention over time until all clusters are treated. While the traditional posttest-only parallel design requires adjustment for a single intraclass correlation coefficient, the stepped wedge design allows multiple outcome measurements from the same cluster and so additional correlation parameters are necessary to characterize the within-cluster correlation structure. Although a number of studies have differentiated between the concepts of within-period and between-period correlations, only a few studies have allowed the between-period correlation to decay over time. In this article, we consider the proportional decay correlation structure for a cohort stepped wedge design, and provide a matrix-adjusted quasi-least squares (MAQLS) approach to accurately estimate the correlation parameters along with the marginal intervention effect. We further develop the sample size and power procedures accounting for the correlation decay, and investigate the accuracy of the power procedure with continuous outcomes in a simulation study. We show that the empirical power agrees well with the prediction even with as few as 9 clusters, when data are analyzed with MAQLS concurrently with a suitable bias-corrected sandwich variance. Two trial examples are provided to illustrate the new sample size procedure.

preprint2021arXiv

Marginal modeling of cluster-period means and intraclass correlations in stepped wedge designs with binary outcomes

Stepped wedge cluster randomized trials (SW-CRTs) with binary outcomes are increasingly used in prevention and implementation studies. Marginal models represent a flexible tool for analyzing SW-CRTs with population-averaged interpretations, but the joint estimation of the mean and intraclass correlation coefficients (ICCs) can be computationally intensive due to large cluster-period sizes. Motivated by the need for marginal inference in SW-CRTs, we propose a simple and efficient estimating equations approach to analyze cluster-period means. We show that the quasi-score for the marginal mean defined from individual-level observations can be reformulated as the quasi-score for the same marginal mean defined from the cluster-period means. An additional mapping of the individual-level ICCs into correlations for the cluster-period means further provides a rigorous justification for the cluster-period approach. The proposed approach addresses a long-recognized computational burden associated with estimating equations defined based on individual-level observations, and enables fast point and interval estimation of the intervention effect and correlations. We further propose matrix-adjusted estimating equations to improve the finite-sample inference for ICCs. By providing a valid approach to estimate ICCs within the class of generalized linear models for correlated binary outcomes, this article operationalizes key recommendations from the CONSORT extension to SW-CRTs, including the reporting of ICCs.

preprint2021arXiv

PSweight: An R Package for Propensity Score Weighting Analysis

Propensity score weighting is an important tool for comparative effectiveness research.Besides the inverse probability of treatment weights (IPW), recent development has introduced a general class of balancing weights, corresponding to alternative target populations and estimands. In particular, the overlap weights (OW) lead to optimal covariate balance and estimation efficiency, and a target population of scientific and policy interest. We develop the R package PSweight to provide a comprehensive design and analysis platform for causal inference based on propensity score weighting. PSweight supports (i) a variety of balancing weights, (ii) binary and multiple treatments,(iii) simple and augmented weighting estimators, (iv) nuisance-adjusted sandwich variances, and(v) ratio estimands. PSweight also provides diagnostic tables and graphs for covariate balance assessment. We demonstrate the functionality of the package using a data example from the NationalChild Development Survey (NCDS), where we evaluate the causal effect of educational attainment on income.

preprint2020arXiv

A Regression Discontinuity Design for Ordinal Running Variables: Evaluating Central Bank Purchases of Corporate Bonds

Regression discontinuity (RD) is a widely used quasi-experimental design for causal inference. In the standard RD, the assignment to treatment is determined by a continuous pretreatment variable (i.e., running variable) falling above or below a pre-fixed threshold. In the case of the corporate sector purchase programme (CSPP) of the European Central Bank, which involves large-scale purchases of securities issued by corporations in the euro area, such a threshold can be defined in terms of an ordinal running variable. This feature poses challenges to RD estimation due to the lack of a meaningful measure of distance. To evaluate such program, this paper proposes an RD approach for ordinal running variables under the local randomization framework. The proposal first estimates an ordered probit model for the ordinal running variable. The estimated probability of being assigned to treatment is then adopted as a latent continuous running variable and used to identify a covariate-balanced subsample around the threshold. Assuming local unconfoundedness of the treatment in the subsample, an estimate of the effect of the program is obtained by employing a weighted estimator of the average treatment effect. Two weighting estimators---overlap weights and ATT weights---as well as their augmented versions are considered. We apply the method to evaluate the causal effect of the CSPP and find a statistically significant and negative effect on corporate bond spreads at issuance.

preprint2020arXiv

Accurate nuclear symmetry energy at finite temperature within a BHF approach

We compute the free energy of asymmetric nuclear matter in a Brueckner-Hartree-Fock approach at finite temperature, paying particular attention to the dependence on isospin asymmetry. The first- and second-order symmetry energies are determined as functions of density and temperature and useful parametrizations are provided. We find small deviations from the quadratic isospin dependence and very small corresponding effects on (proto)neutron star structure.

preprint2020arXiv

Deep Double-Side Learning Ensemble Model for Few-Shot Parkinson Speech Recognition

Diagnosis and therapeutic effect assessment of Parkinson disease based on voice data are very important,but its few-shot learning problem is challenging.Although deep learning is good at automatic feature extraction, it suffers from few-shot learning problem. Therefore, the general effective method is first conduct feature extraction based on prior knowledge, and then carry out feature reduction for subsequent classification. However, there are two major problems: 1) Structural information among speech features has not been mined and new features of higher quality have not been reconstructed. 2) Structural information between data samples has not been mined and new samples with higher quality have not been reconstructed. To solve these two problems, based on the existing Parkinson speech feature data set, a deep double-side learning ensemble model is designed in this paper that can reconstruct speech features and samples deeply and simultaneously. As to feature reconstruction, an embedded deep stacked group sparse auto-encoder is designed in this paper to conduct nonlinear feature transformation, so as to acquire new high-level deep features, and then the deep features are fused with original speech features by L1 regularization feature selection method. As to speech sample reconstruction, a deep sample learning algorithm is designed in this paper based on iterative mean clustering to conduct samples transformation, so as to obtain new high-level deep samples. Finally, the bagging ensemble learning mode is adopted to fuse the deep feature learning algorithm and the deep samples learning algorithm together, thereby constructing a deep double-side learning ensemble model. At the end of this paper, two representative speech datasets of Parkinson's disease were used for verification. The experimental results show that the proposed algorithm are effective.

preprint2020arXiv

Distributed Equivalent Substitution Training for Large-Scale Recommender Systems

We present Distributed Equivalent Substitution (DES) training, a novel distributed training framework for large-scale recommender systems with dynamic sparse features. DES introduces fully synchronous training to large-scale recommendation system for the first time by reducing communication, thus making the training of commercial recommender systems converge faster and reach better CTR. DES requires much less communication by substituting the weights-rich operators with the computationally equivalent sub-operators and aggregating partial results instead of transmitting the huge sparse weights directly through the network. Due to the use of synchronous training on large-scale Deep Learning Recommendation Models (DLRMs), DES achieves higher AUC(Area Under ROC). We successfully apply DES training on multiple popular DLRMs of industrial scenarios. Experiments show that our implementation outperforms the state-of-the-art PS-based training framework, achieving up to 68.7% communication savings and higher throughput compared to other PS-based recommender systems.

preprint2020arXiv

In situ modification of delafossite type PdCoO2 bulk single crystal for reversible hydrogen sorption and fast hydrogen evolution

The observation of extraordinarily high conductivity in delafossite-type PdCoO2 is of great current interest, and there is some evidence that electrons behave like a fluid when flowing in bulk crystals of PdCoO2. Thus, this material is an ideal platform for the study of the electron transfer processes in heterogeneous reactions. Here, we report the use of bulk single crystal PdCoO2 as a promising electrocatalyst for hydrogen evolution reactions (HERs). An overpotential of only 31 mV results in a current density of 10 mA cm^(-2), accompanied by high long-term stability. We have precisely determined that the crystal surface structure is modified after electrochemical activation with the formation of strained Pd nanoclusters in the surface layer. These nanoclusters exhibit reversible hydrogen sorption and desorption, creating more active sites for hydrogen access. The bulk PdCoO2 single crystal with ultra-high conductivity, which acts as a natural substrate for the Pd nanoclusters, provides a high-speed channel for electron transfer

preprint2020arXiv

Is being an only child harmful to psychological health?: Evidence from an instrumental variable analysis of China's One-Child Policy

This paper evaluates the effects of being an only child in a family on psychological health, leveraging data on the One-Child Policy in China. We use an instrumental variable approach to address the potential unmeasured confounding between the fertility decision and psychological health, where the instrumental variable is an index on the intensity of the implementation of the One-Child Policy. We establish an analytical link between the local instrumental variable approach and principal stratification to accommodate the continuous instrumental variable. Within the principal stratification framework, we postulate a Bayesian hierarchical model to infer various causal estimands of policy interest while adjusting for the clustering data structure. We apply the method to the data from the China Family Panel Studies and find small but statistically significant negative effects of being an only child on self-reported psychological health for some subpopulations. Our analysis reveals treatment effect heterogeneity with respect to both observed and unobserved characteristics. In particular, urban males suffer the most from being only children, and the negative effect has larger magnitude if the families were more resistant to the One-Child Policy. We also conduct sensitivity analysis to assess the key instrumental variable assumption.

preprint2020arXiv

Learning Consistency Pursued Correlation Filters for Real-Time UAV Tracking

Correlation filter (CF)-based methods have demonstrated exceptional performance in visual object tracking for unmanned aerial vehicle (UAV) applications, but suffer from the undesirable boundary effect. To solve this issue, spatially regularized correlation filters (SRDCF) proposes the spatial regularization to penalize filter coefficients, thereby significantly improving the tracking performance. However, the temporal information hidden in the response maps is not considered in SRDCF, which limits the discriminative power and the robustness for accurate tracking. This work proposes a novel approach with dynamic consistency pursued correlation filters, i.e., the CPCF tracker. Specifically, through a correlation operation between adjacent response maps, a practical consistency map is generated to represent the consistency level across frames. By minimizing the difference between the practical and the scheduled ideal consistency map, the consistency level is constrained to maintain temporal smoothness, and rich temporal information contained in response maps is introduced. Besides, a dynamic constraint strategy is proposed to further improve the adaptability of the proposed tracker in complex situations. Comprehensive experiments are conducted on three challenging UAV benchmarks, i.e., UAV123@10FPS, UAVDT, and DTB70. Based on the experimental results, the proposed tracker favorably surpasses the other 25 state-of-the-art trackers with real-time running speed ($\sim$43FPS) on a single CPU.

preprint2020arXiv

Propensity Score Weighting for Covariate Adjustment in Randomized Clinical Trials

Chance imbalance in baseline characteristics is common in randomized clinical trials. Regression adjustment such as the analysis of covariance (ANCOVA) is often used to account for imbalance and increase precision of the treatment effect estimate. An objective alternative is through inverse probability weighting (IPW) of the propensity scores. Although IPW and ANCOVA are asymptotically equivalent, the former may demonstrate inferior performance in finite samples. In this article, we point out that IPW is a special case of the general class of balancing weights, and advocate to use overlap weighting (OW) for covariate adjustment. The OW method has a unique advantage of completely removing chance imbalance when the propensity score is estimated by logistic regression. We show that the OW estimator attains the same semiparametric variance lower bound as the most efficient ANCOVA estimator and the IPW estimator for a continuous outcome, and derive closed-form variance estimators for OW when estimating additive and ratio estimands. Through extensive simulations, we demonstrate OW consistently outperforms IPW in finite samples and improves the efficiency over ANCOVA and augmented IPW when the degree of treatment effect heterogeneity is moderate or when the outcome model is incorrectly specified. We apply the proposed OW estimator to the Best Apnea Interventions for Research (BestAIR) randomized trial to evaluate the effect of continuous positive airway pressure on patient health outcomes. All the discussed propensity score weighting methods are implemented in the R package PSweight.

preprint2020arXiv

Towards 5G: Joint Optimization of Video Segment Cache, Transcoding and Resource Allocation for Adaptive Video Streaming in a Muti-access Edge Computing Network

The cache and transcoding of the multi-access edge computing (MEC) server and wireless resource allocation in eNodeB interact and determine the quality of experience (QoE) of dynamic adaptive streaming over HTTP (DASH) clients in MEC networks. However, the relationship among the three factors has not been explored, which has led to limited improvement in clients' QoE. Therefore, we propose a joint optimization framework of video segment cache and transcoding in MEC servers and resource allocation to improve the QoE of DASH clients. Based on the established framework, we develop a MEC cache management mechanism that consists of the MEC cache partition, video segment deletion, and MEC cache space transfer. Then, a joint optimization algorithm that combines video segment cache and transcoding in the MEC server and resource allocation is proposed. In the algorithm, the clients' channel state and the playback status and cooperation among MEC servers are employed to estimate the client's priority, video segment presentation switch and continuous playback time. Considering the above four factors, we develop a utility function model of clients' QoE. Then, we formulate a mixed-integer nonlinear programming mathematical model to maximize the total utility of DASH clients, where the video segment cache and transcoding strategy and resource allocation strategy are jointly optimized. To solve this problem, we propose a low-complexity heuristic algorithm that decomposes the original problem into multiple subproblems. The simulation results show that our proposed algorithms efficiently improve client's throughput, received video quality and hit ratio of video segments while decreasing the playback rebuffering time, video segment presentation switch and system backhaul traffic.

preprint2020arXiv

Training-Set Distillation for Real-Time UAV Object Tracking

Correlation filter (CF) has recently exhibited promising performance in visual object tracking for unmanned aerial vehicle (UAV). Such online learning method heavily depends on the quality of the training-set, yet complicated aerial scenarios like occlusion or out of view can reduce its reliability. In this work, a novel time slot-based distillation approach is proposed to efficiently and effectively optimize the training-set's quality on the fly. A cooperative energy minimization function is established to score the historical samples adaptively. To accelerate the scoring process, frames with high confident tracking results are employed as the keyframes to divide the tracking process into multiple time slots. After the establishment of a new slot, the weighted fusion of the previous samples generates one key-sample, in order to reduce the number of samples to be scored. Besides, when the current time slot exceeds the maximum frame number, which can be scored, the sample with the lowest score will be discarded. Consequently, the training-set can be efficiently and reliably distilled. Comprehensive tests on two well-known UAV benchmarks prove the effectiveness of our method with real-time speed on a single CPU.

preprint2020arXiv

WasteNet: Waste Classification at the Edge for Smart Bins

Smart Bins have become popular in smart cities and campuses around the world. These bins have a compaction mechanism that increases the bins' capacity as well as automated real-time collection notifications. In this paper, we propose WasteNet, a waste classification model based on convolutional neural networks that can be deployed on a low power device at the edge of the network, such as a Jetson Nano. The problem of segregating waste is a big challenge for many countries around the world. Automated waste classification at the edge allows for fast intelligent decisions in smart bins without needing access to the cloud. Waste is classified into six categories: paper, cardboard, glass, metal, plastic and other. Our model achieves a 97\% prediction accuracy on the test dataset. This level of classification accuracy will help to alleviate some common smart bin problems, such as recycling contamination, where different types of waste become mixed with recycling waste causing the bin to be contaminated. It also makes the bins more user friendly as citizens do not have to worry about disposing their rubbish in the correct bin as the smart bin will be able to make the decision for them.

preprint2019arXiv

BGD-based Adam algorithm for time-domain equalizer in PAM-based optical interconnects

To the best of our knowledge, for the first time, we propose adaptive moment estimation (Adam) algorithm based on batch gradient descent (BGD) to design a time-domain equalizer (TDE) for PAM-based optical interconnects. Adam algorithm has been widely applied in the fields of artificial intelligence. For TDE, BGD-based Adam algorithm can obtain globally optimal tap coefficients without being trapped in locally optimal tap coefficients. Therefore, fast and stable convergence can be achieved by BGD-based Adam algorithm with low mean square error. Meanwhile, BGD-based Adam algorithm is implemented by parallel processing, which is more efficient than conventional serial algorithms, such as least mean square and recursive least square algorithms. The experimental results demonstrate that BGD-based Adam feed-forward equalizer works well in 120-Gbit/s PAM8 optical interconnects. In conclusion, BGD-based Adam algorithm shows great potential for converging the tap coefficients of TDE in future optical interconnects.

preprint2019arXiv

Double-Robust Estimation in Difference-in-Differences with an Application to Traffic Safety Evaluation

Difference-in-differences (DID) is a widely used approach for drawing causal inference from observational panel data. Two common estimation strategies for DID are outcome regression and propensity score weighting. In this paper, motivated by a real application in traffic safety research, we propose a new double-robust DID estimator that hybridizes regression and propensity score weighting. We particularly focus on the case of discrete outcomes. We show that the proposed double-robust estimator possesses the desirable large-sample robustness property. We conduct a simulation study to examine its finite-sample performance and compare with alternative methods. Our empirical results from a Pennsylvania Department of Transportation data suggest that rumble strips are marginally effective in reducing vehicle crashes.