So recall that we are interested on the statistical aspects of taking and analysing a sample, and a good sample will be representative in the sense that characteristics of interest in the population can be estimated form the sample with a known degree of accuracy.

Here, we will use Probability Sampling to conduct surveys. Probability sampling means each unit in the population has a known non-zero probability of being included in the sample. At the same time, we will make the following assumption:

1. Sampled population = target population
2. Sampling frame is complete, no non-response or missing data
3. No measurement error

Clearly, with these assumptions, we have removed non-sampling error and only observe sampling error.

Simple Random Sample

• Simplest form of probability sample
• Each unit has an equal probability to be in the sample
• Each sample of size has the same chance of being the samples

Systematic Sample

• Units are equally spaced in the list

Stratified sample

• Elements in the same stratum often tend to be more similar.
• Simple random sample selected from each stratum, and sample random samples in the strata are selected independently

Cluster Sample

• Elements are aggregated into larger sampling units (cluster)
• The cluster is sampled:
• One – stage (entire cluster is sampled)
• Two – stage (probability sampling within the cluster)

So here is an example to sample 20 integers from the population {1, 2, …, 100} using the above methods

1. Simple random sample: Use a computer to randomly generate 20 integers from 1 to 100.
2. Systematic sample: Use a computer to randomly generate an integer from 1 to 5, then take every $5^{th}$ element. Suppose it was 2, then the sample contains units 2, 7, 12, 17, …
3. Stratified sample: Divide the population into 10 strata, {1, 2, …, 10}, {11, 12, …, 20}, …, {91, 92, …, 100}, and a simple random sample of 2 numbers will be drawn from each of the 10 strata.
4. Cluster sample: Divide the population into 20 clusters {1, 2, 3, 4, 5}, {6, 7, 8, 9, 10}, …, {96, 97, 98, 99, 100}. A simple random sample of 4 of these clusters is selected.

Now we move on to developing some concepts and tools to analyse our sample.

For most samples, we are establish a characteristic of interest, y. Let $y_i$ be the characteristic of interest for unit i.

1. Population mean, $\bar{y_u}$
$\bar{y_u} = E(y) = \frac{1}{N} \sum_{i=1}^N y_i$
2. Population proportion, p
This is a special population mean.
Let $y_i$ be binary variable, taking value of 1 if unit i have characteristic and 0 if unit i does not have characteristic.
$p = \frac{1}{N} \sum_{i=1}^N y_i$
3. Population Total t
$t = \sum_{i=1}^N y_i$
$\Rightarrow p$ $= \bar{y_u}$ $= \frac{1}{N} \sum_{i=1}^N y_i = \frac{t}{N}$
$t = \sum_{i=1}^N y_i = N \bar{y_u} = Np$
4. Population variance $S^2$
$S^2 = Var(y) = \frac{1}{N-1} \sum_{i=1}^N (y_i - \bar{y_u})^2$
S is the standard deviation of y.
5. Coefficient of variation CY(y)
The coefficient of variation is a measure of relative variability; it is the ratio of the standard deviation of y with $\bar{y_u}$.
$CV(Y) = \frac{S}{\bar{y_u}}$

Next, we will delve deep into each of the sampling methods above.

Sampling & Survey #1 – Introduction
Sampling & Survey #2 – Simple Probability Samples
Sampling & Survey #3 – Simple Random Sampling
Sampling & Survey #4 – Qualities of estimator in SRS
Sampling & Survey #5 – Sampling weight, Confidence Interval and sample size in SRS
Sampling & Survey #6 – Systematic Sampling
Sampling & Survey #7 – Stratified Sampling
Sampling & Survey # 8 – Ratio Estimation
Sampling & Survey # 9 – Regression Estimation
Sampling & Survey #10 – Cluster Sampling
Sampling & Survey #11 – Two – Stage Cluster Sampling
Sampling & Survey #12 – Sampling with unequal probabilities (Part 1)
Sampling & Survey #13 – Sampling with unequal probabilities (Part 2)
Sampling & Survey #14 – Nonresponse

### One Comment

• […] & Survey #1 – Introduction Sampling & Survey #2 – Simple Probability Samples Sampling & Survey #3 – Simple Random Sampling Sampling & Survey #4 – […]