Researcher profile

Yong Cai

Yong Cai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

A Modified Randomization Test for the Level of Clustering

Suppose a researcher observes individuals within a county within a state. Given concerns about correlation across individuals, it is common to group observations into clusters and conduct inference treating observations across clusters as roughly independent. However, a researcher that has chosen to cluster at the county level may be unsure of their decision, given knowledge that observations are independent across states. This paper proposes a modified randomization test as a robustness check for the chosen level of clustering in a linear regression setting. Existing tests require either the number of states or number of counties to be large. Our method is designed for settings with few states and few counties. While the method is conservative, it has competitive power in settings that may be relevant to empirical work.

preprint2022arXiv

FD-GATDR: A Federated-Decentralized-Learning Graph Attention Network for Doctor Recommendation Using EHR

In the past decade, with the development of big data technology, an increasing amount of patient information has been stored as electronic health records (EHRs). Leveraging these data, various doctor recommendation systems have been proposed. Typically, such studies process the EHR data in a flat-structured manner, where each encounter was treated as an unordered set of features. Nevertheless, the heterogeneous structured information such as service sequence stored in claims shall not be ignored. This paper presents a doctor recommendation system with time embedding to reconstruct the potential connections between patients and doctors using heterogeneous graph attention network. Besides, to address the privacy issue of patient data sharing crossing hospitals, a federated decentralized learning method based on a minimization optimization model is also proposed. The graph-based recommendation system has been validated on a EHR dataset. Compared to baseline models, the proposed method improves the AUC by up to 6.2%. And our proposed federated-based algorithm not only yields the fictitious fusion center's performance but also enjoys a convergence rate of O(1/T).

preprint2022arXiv

On the implementation of Approximate Randomization Tests in Linear Models with a Small Number of Clusters

This paper provides a user's guide to the general theory of approximate randomization tests developed in Canay, Romano, and Shaikh (2017) when specialized to linear regressions with clustered data. An important feature of the methodology is that it applies to settings in which the number of clusters is small -- even as small as five. We provide a step-by-step algorithmic description of how to implement the test and construct confidence intervals for the parameter of interest. In doing so, we additionally present three novel results concerning the methodology: we show that the method admits an equivalent implementation based on weighted scores; we show the test and confidence intervals are invariant to whether the test statistic is studentized or not; and we prove convexity of the confidence intervals for scalar parameters. We also articulate the main requirements underlying the test, emphasizing in particular common pitfalls that researchers may encounter. Finally, we illustrate the use of the methodology with two applications that further illuminate these points. The companion {\tt R} and {\tt Stata} packages facilitate the implementation of the methodology and the replication of the empirical exercises.

preprint2022arXiv

Panel Data with Unknown Clusters

Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to help researchers discover clusters in panel data. Our method is based on thresholding an estimated long-run variance-covariance matrix and requires the panel to be large in the time dimension, but imposes no lower bound on the number of units. We show that our procedure recovers the true clusters with high probability with no assumptions on the cluster structure. The estimated clusters are independently of interest, but they can also be used in the approximate randomization tests or with conventional cluster-robust covariance estimators. The resulting procedures control size and have good power.

preprint2020arXiv

Pre-inflation and Trans-Planckian Censorship

We investigate the implication of Trans-Planckian Censorship Conjecture (TCC) for the initial state of primordial perturbations. It is possible to set the state of perturbation modes in the infinite past as the Minkowski vacuum, only if the pre-inflationary era is past-complete. We calculate the evolution of the perturbation modes in such a pre-inflationary era and show that at the beginning of inflation the perturbation modes with wavelengths much shorter than the Hubble scale (but still larger than the Planck length scale) will behave as they are in the Bunch-Davis state. Therefore, a past-complete pre-inflationary evolution may automatically prepare the initial state required for the inflationary perturbations at the CMB window while obeying the TCC.

preprint2020arXiv

The 2020 Sturgis Motorcycle Rally and COVID-19

The Sturgis Motorcycle Rally that took place from August 7-16 was one of the largest public gatherings since the start of the COVID-19 outbreak. Over 460,000 visitors from across the United States travelled to Sturgis, South Dakota to attend the ten day event. Using anonymous cell phone tracking data we identify the home counties of visitors to the rally and examine the impact of the rally on the spread of COVID-19. Our baseline estimate suggests a one standard deviation increase in Sturgis attendance increased COVID-19 case growth by 1.1pp in the weeks after the rally.

preprint2020arXiv

Trans-Planckian censorship of multistage inflation and dark energy

We explore the bound of the trans-Planckian censorship conjecture on an inflation model with multiple stages. We show that if the first inflationary stage is responsible for the primordial perturbations in the cosmic microwave background window, the $e$-folding number of each subsequent stage will be bounded by the energy scale of the first stage. This seems to imply that the lifetime of the current era of accelerated expansion (regarded as one of the multiple inflationary stages) might be a probe for distinguishing inflation from its alternatives. We also present a multistage inflation model in a landscape consisting of anti-de Sitter vacua separated by potential barriers.