So today we discuss about Cluster Sampling. We will begin by looking at cluster sampling with equal probabilities. Firstly, a cluster is a grouping of the members of the population. Previously, we assumed that we can directly sample the elements of the population and these sampling units are the elements of the population. However, this sampling frame list of elements might not be easily obtained due to cost constraints or limitations such as access to information.
An example: If we want to study residents of Singapore (Population), the element is the person. However, we may not be able to contact them directly, and can only find information of the household (cluster).

• In cluster sampling, the clusters are also called primary sampling units while the elements are called secondary sampling units.
• In cluster sampling, the sampling units are the clusters and the elements observed are the secondary sampling units within the clusters. Previously, we have treated  our primary sampling units as secondary sampling units.

Why is there a need to do cluster sampling?
Firstly, constructing a sampling frame list of observations units may be difficult, expensive or impossible.
Secondly, the population may be widely distributed geographically or may occur in natural clusters and it may be more cost efficient to take a sampling of clusters rather than an SRS of individuals.

As before, we have a lot of notations to follow first. And before that, I’ll illustrate a scenario here first. Suppose that our population consists of clusters of elements.
Cluster 1: $y_{1}1, \ldots , y_1 M_1$ Subtotal $t_1 = \sum_{j=1}^{M_1} y_{1j}$
Cluster 2: $y_{2}1, \ldots , y_2 M_2$ Subtotal $t_2 = \sum_{j=1}^{M_2} y_{2j}$

Cluster 1: $y_{N}1, \ldots , y_N M_N$ Subtotal $t_1 = \sum_{j=1}^{M_N} y_{Nj}$
$y_{ij}$ is the measurement for the $j^{th}$ element in the $i^{th}$ cluster.
N = number of clusters in the population
$M_i$ = number of elements in the cluster
$M_0 = \sum_{i=1}^N M_i$ is the number of elements in the population
$\bar{M} = \frac{M_0}{N}$ is the average cluster size for the population
$t_i = \sum_{j=1}^{M_1} y_{ij}$ is the subtotal for the $i^{th}$ cluster
$t = \sum_{i=1}^N t_i = \sum_{i=1}^N \sum_{j=1}^{M_i} y_{ij}$ is the population total
$S_t^2 = \frac{1}{N-1} \sum_{i=1}^N (t_i - \frac{t}{N})^2$ is the population variance of the cluster totals.
$S^2 = \frac{1}{M_0 - 1} \sum_{i=1}^N \sum_{j=1}^{M _i} (y_{ij} - \bar{y_u})^2$ is the population variance per element
$S_i^2 = \frac{1}{M_i - 1} \sum_{j=1}^{M_i} (y_{ij} - \bar{y_{iu}})^2$ is the population variance within cluster i

Finally, we are done with the notations and can finally ask ourselves how do we draw a cluster sample? We will study two approaches here, one-stage cluster sampling and two-stage cluster sampling. We will do the former now, and the latter in the next post.

One-Stage cluster sampling means that every element within a sampled cluster is included in the sample. Here, the population $M_0$ elements are divided in N clusters of size.

TBC…

Sampling & Survey #1 – Introduction
Sampling & Survey #2 – Simple Probability Samples
Sampling & Survey #3 – Simple Random Sampling
Sampling & Survey #4 – Qualities of estimator in SRS
Sampling & Survey #5 – Sampling weight, Confidence Interval and sample size in SRS
Sampling & Survey #6 – Systematic Sampling
Sampling & Survey #7 – Stratified Sampling
Sampling & Survey # 8 – Ratio Estimation
Sampling & Survey # 9 – Regression Estimation
Sampling & Survey #10 – Cluster Sampling
Sampling & Survey #11 – Two – Stage Cluster Sampling
Sampling & Survey #12 – Sampling with unequal probabilities (Part 1)
Sampling & Survey #13 – Sampling with unequal probabilities (Part 2)
Sampling & Survey #14 – Nonresponse