Source author record

Krista J. Gile

Krista J. Gile appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Methodology Computation

Catalog footprint

What is connected

7works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2012arXiv

Diagnostics for Respondent-driven Sampling

Respondent-driven sampling (RDS) is a widely used method for sampling from hard-to-reach human populations, especially groups most at-risk for HIV/AIDS. Data are collected through a peer-referral process in which current sample members harness existing social networks to recruit additional sample members. RDS has proven to be a practical method of data collection in many difficult settings and has been adopted by leading public health organizations around the world. Unfortunately, inference from RDS data requires many strong assumptions because the sampling design is not fully known and is partially beyond the control of the researcher. In this paper, we introduce diagnostic tools for most of the assumptions underlying RDS inference. We also apply these diagnostics in a case study of 12 populations at increased risk for HIV/AIDS. We developed these diagnostics to enable RDS researchers to better understand their data and to encourage future statistical research on RDS.

preprint2012arXiv

Estimating Hidden Population Size using Respondent-Driven Sampling Data

Respondent-Driven Sampling (RDS) is an approach to sampling design and inference in hard-to-reach human populations. Typically, a sampling frame is not available, and population members are difficult to identify or recruit from broader sampling frames. Common examples include injecting drug users, men who have sex with men, and female sex workers. Most analysis of RDS data has focused on estimating aggregate characteristics, such as disease prevalence. However, RDS is often conducted in settings where the population size is unknown and of great independent interest. This paper presents an approach to estimating the size of a target population based on data collected through RDS. The proposed approach uses a successive sampling approximation to RDS to leverage information in the ordered sequence of observed personal network sizes. The inference uses the Bayesian framework, allowing for the incorporation of prior knowledge. A flexible class of priors for the population size is proposed that aids elicitation. An extensive simulation study provides insight into the performance of the method for estimating population size under a broad range of conditions. A further study shows the approach also improves estimation of aggregate characteristics. A particular choice of the prior produces interval estimates with good frequentist properties. Finally, the method demonstrates sensible results when used to estimate the numbers of sub-populations most at risk for HIV in two cities in El Salvador.

preprint2011arXiv

Network Model-Assisted Inference from Respondent-Driven Sampling Data

Respondent-Driven Sampling is a method to sample hard-to-reach human populations by link-tracing over their social networks. Beginning with a convenience sample, each person sampled is given a small number of uniquely identified coupons to distribute to other members of the target population, making them eligible for enrollment in the study. This can be an effective means to collect large diverse samples from many populations. Inference from such data requires specialized techniques for two reasons. Unlike in standard sampling designs, the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights necessary for traditional design-based inference. Any likelihood-based inference requires the modeling of the complex sampling process often beginning with a convenience sample. We introduce a model-assisted approach, resulting in a design-based estimator leveraging a working model for the structure of the population over which sampling is conducted. We demonstrate that the new estimator has improved performance compared to existing estimators and is able to adjust for the bias induced by the selection of the initial sample. We present sensitivity analyses for unknown population sizes and the misspecification of the working network model. We develop a bootstrap procedure to compute measures of uncertainty. We apply the method to the estimation of HIV prevalence in a population of injecting drug users (IDU) in the Ukraine, and show how it can be extended to include application-specific information.

preprint2011arXiv

On the Concept of Snowball Sampling

This brief comment reflects on the historical and current uses of the term "snowball sampling."

preprint2010arXiv

Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation

Respondent-driven sampling is a form of link-tracing network sampling, which is widely used to study hard-to-reach populations, often to estimate population proportions. Previous treatments of this process have used a with-replacement approximation, which we show induces bias in estimates for large sample fractions and differential network connectedness by characteristic of interest. We present a treatment of respondent-driven sampling as a successive sampling process. Unlike existing representations, our approach respects the essential without-replacement feature of the process, while converging to an existing with-replacement representation for small sample fractions, and to the sample mean for a full-population sample. We present a successive-sampling based estimator for population means based on respondent-driven sampling data, and demonstrate its superior performance when the size of the hidden population is known. We present sensitivity analyses for unknown population sizes. In addition, we note that like other existing estimators, our new estimator is subject to bias induced by the selection of the initial sample. Using data collected among three populations in two countries, we illustrate the application of this approach to populations with varying characteristics. We conclude that the successive sampling estimator improves on existing estimators, and can also be used as a diagnostic tool when population size is not known.

preprint2010arXiv

Modeling social networks from sampled data

Network models are widely used to represent relational information among interacting units and the structural implications of these relations. Recently, social network studies have focused a great deal of attention on random graph models of networks whose nodes represent individual social actors and whose edges represent a specified relationship between the actors. Most inference for social network models assumes that the presence or absence of all possible links is observed, that the information is completely reliable, and that there are no measurement (e.g., recording) errors. This is clearly not true in practice, as much network data is collected though sample surveys. In addition even if a census of a population is attempted, individuals and links between individuals are missed (i.e., do not appear in the recorded data). In this paper we develop the conceptual and computational theory for inference based on sampled network information. We first review forms of network sampling designs used in practice. We consider inference from the likelihood framework, and develop a typology of network data that reflects their treatment within this frame. We then develop inference for social network models based on information from adaptive network designs. We motivate and illustrate these ideas by analyzing the effect of link-tracing sampling designs on a collaboration network.

preprint2010arXiv

The Effect of Differential Recruitment, Non-response and Non-recruitment on Estimators for Respondent-Driven Sampling

Respondent-driven sampling is a widely-used network sampling technique, designed to sample from hard-to-reach populations. Estimation from the resulting samples is an area of active research, with software available to compute at least four estimators of a population proportion. Each estimator is claimed to address deficiencies in previous estimators, however those claims are often unsubstantiated. In this study we provide a simulation-based comparison of five existing estimators, focussing on sampling conditions which a recent estimator is designed to address. We find no estimator consistently out-performs all others, and highlight sampling conditions in which each is to be preferred.

Krista J. Gile

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Diagnostics for Respondent-driven Sampling

Estimating Hidden Population Size using Respondent-Driven Sampling Data

Network Model-Assisted Inference from Respondent-Driven Sampling Data

On the Concept of Snowball Sampling

Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation

Modeling social networks from sampled data

The Effect of Differential Recruitment, Non-response and Non-recruitment on Estimators for Respondent-Driven Sampling