What should be considered when choosing sample populations for remote data collection?

There are four main options for identifying potential sample populations shown in the table below. These are based on two main considerations:

Is the sample representative of the entire population of interest?
Will you be using an existing sample or will a new sample be created?

A brief description of each of these four options resulting from the combinations of these two considerations is given below.

Table 3: Main options for remote data collection samples

What are the advantages and disadvantages of pre-existing samples?

There are several benefits to using pre-existing samples, such as the following:

Existing information about people in the sample - Since it's hard to maintain people’s attention for long periods of time in remote surveys, it is a major advantage if you already have existing demographic information about your sample population.

Consent processes may be shorter - If participants have completed a more extensive informed consent process previously, you may only need to provide information on how the general consent process is being updated in the current circumstances and this could therefore save time. If an initial face-to-face consent process was used then this may lead to higher confidence in the research process among participants.

Reduction in (some kinds of) bias - Previous relationships may lead to less bias in self-reported behaviors, attitudes, or beliefs, particularly if these opinions or practices run contrary to existing norms or government guidelines.

Higher response rates - Having existing relationships can make a major difference to response rates. For example, one random digit dialing study in India had response rates of only about 25% while similar studies among pre-existing cohorts in Kenya found that, of the 74% of potential participants who answered the call, only 1% declined to participate. Low rates of participation may introduce biases that are challenging to correct.

Some potential disadvantages to using pre-existing samples include:

Missing sub-groups - The original sample may not be completely representative of the target population of the new sample, for example, if the original sample was based on a subgroup of the population (i.e. mothers of adolescents).
Social desirability bias - Knowledge of the objectives of a prior study (e.g., a study on healthy behaviors) may bias responses if participants know the views of those conducting the study.
Respondent fatigue - Respondents may be less interested in continuing to answer questions or may dedicate less attention to their responses.

What are the advantages of representative samples?

Where possible, you should aim to have a representative sample. Having a representative sample means that those in your sample are not systematically different from the population about which you want to learn and this is key to ensuring that your conclusions are as accurate as possible. Non-representativeness can be introduced in several ways with some significant drawbacks:

Non-representative samples may exclude vulnerable or marginalised individuals and people who live in inaccessible locations. These people may be less likely to be present in many convenience samples (e.g. sampling visitors to a clinic).
Non-representativeness introduced by remote data collection itself may exclude vulnerable or marginalised individuals because they cannot be reached remotely. For example, most remote data collection requires access to mobile phones and about 30% of people globally do not have phones.
Even a random digit dialing approach can be biased because some families have more than one phone number meaning that the probability of their household being selected is higher. Furthermore, certain age groups or genders may be more or less likely to answer the phone or to agree to respond to a survey on the phone.

A representative sample might also be representative of just a portion of the population, rather than the entire population. For example, understanding the impact of COVID-19 on women, young people, or people with physical impairments may benefit from focusing just on that population. In this case, a pre-existing sample could be reduced to focus on the sub-group of interest, or a newly constructed sample could use screening questions to establish that the respondent is a part of the sub-group of interest.

How should you account and adjust for low response rates in newly created samples or non-representativeness in your sampling frame?

Achieving a representative sample is a challenge for all research work. But, it is especially important to be aware of biases that could be created by trying to construct a sample or collect data remotely and to report and account for these biases in drawing conclusions.

Several approaches are available to adjust for non-response bias and non-representative sampling.

When the response rate is not high, those responding might differ systematically from those not responding to a survey. The data may be adjusted for missingness using a number of techniques (including multiple imputation).
When those in the sample frame are not representative of the target population, post-survey weighting may be employed using a variety of techniques, including raking and matching.

See this World Bank Blog on Mobile Phone Surveys (Part 1): Sampling and Mode for additional details and examples about choosing a sampling frame.

Want to learn more about remote data collection?

Editor's Note

Authors: James B. Tidwell
Reviewers: Lauren D’Mello-Guyett, Poonam Trivedi, Tracy Morse, Erica Wetzler, Michael Joseph, Holta Trandafili
Last update: 15.06.2020