So today we discuss about Cluster Sampling. We will begin by looking at cluster sampling with equal probabilities. Firstly, a cluster is a grouping of the members of the population. Previously, we assumed that we can directly sample the elements of the population and these sampling units are the elements of the population. However, this sampling frame list of elements might not be easily obtained due to cost constraints or limitations such as access to information.
An example: If we want to study residents of Singapore (Population), the element is the person. However, we may not be able to contact them directly, and can only find information of the household (cluster).

  • In cluster sampling, the clusters are also called primary sampling units while the elements are called secondary sampling units.
  • In cluster sampling, the sampling units are the clusters and the elements observed are the secondary sampling units within the clusters. Previously, we have treated  our primary sampling units as secondary sampling units.

Why is there a need to do cluster sampling?
Firstly, constructing a sampling frame list of observations units may be difficult, expensive or impossible.
Secondly, the population may be widely distributed geographically or may occur in natural clusters and it may be more cost efficient to take a sampling of clusters rather than an SRS of individuals.

As before, we have a lot of notations to follow first. And before that, I’ll illustrate a scenario here first. Suppose that our population consists of clusters of elements.
Cluster 1: y_{1}1, \ldots , y_1 M_1 Subtotal t_1 = \sum_{j=1}^{M_1} y_{1j}
Cluster 2: y_{2}1, \ldots , y_2 M_2 Subtotal t_2 = \sum_{j=1}^{M_2} y_{2j}

Cluster 1: y_{N}1, \ldots , y_N M_N Subtotal t_1 = \sum_{j=1}^{M_N} y_{Nj}
y_{ij} is the measurement for the j^{th} element in the i^{th} cluster.
N = number of clusters in the population
M_i = number of elements in the cluster
M_0 = \sum_{i=1}^N M_i is the number of elements in the population
\bar{M} = \frac{M_0}{N} is the average cluster size for the population
t_i = \sum_{j=1}^{M_1} y_{ij} is the subtotal for the i^{th} cluster
t = \sum_{i=1}^N t_i = \sum_{i=1}^N \sum_{j=1}^{M_i} y_{ij} is the population total
S_t^2 = \frac{1}{N-1} \sum_{i=1}^N (t_i - \frac{t}{N})^2 is the population variance of the cluster totals.
S^2 = \frac{1}{M_0 - 1} \sum_{i=1}^N \sum_{j=1}^{M _i} (y_{ij} - \bar{y_u})^2 is the population variance per element
S_i^2 = \frac{1}{M_i - 1} \sum_{j=1}^{M_i} (y_{ij} - \bar{y_{iu}})^2 is the population variance within cluster i

Finally, we are done with the notations and can finally ask ourselves how do we draw a cluster sample? We will study two approaches here, one-stage cluster sampling and two-stage cluster sampling. We will do the former now, and the latter in the next post.

One-Stage cluster sampling means that every element within a sampled cluster is included in the sample. Here, the population M_0 elements are divided in N clusters of size.

TBC…

 

Sampling & Survey #1 – Introduction
Sampling & Survey #2 – Simple Probability Samples
Sampling & Survey #3 – Simple Random Sampling
Sampling & Survey #4 – Qualities of estimator in SRS
Sampling & Survey #5 – Sampling weight, Confidence Interval and sample size in SRS
Sampling & Survey #6 – Systematic Sampling
Sampling & Survey #7 – Stratified Sampling
Sampling & Survey # 8 – Ratio Estimation
Sampling & Survey # 9 – Regression Estimation
Sampling & Survey #10 – Cluster Sampling
Sampling & Survey #11 – Two – Stage Cluster Sampling
Sampling & Survey #12 – Sampling with unequal probabilities (Part 1)
Sampling & Survey #13 – Sampling with unequal probabilities (Part 2)
Sampling & Survey #14 – Nonresponse

Leave a Reply