Today we shall look at the first sampling method we introduced: Simple Random Sampling. Recall that we mentioned that this is the most basic form of probability sampling since it provides the theoretical basis for the more complicated forms. Here we will look at how to select a simple random sample (SRS) and also estimate the population parameters for SRS too.

Before that, we will look at the notations that we will be using:

  1. N: the total number of elements in the population (or Universe, depending on what you’re looking at)
  2. U = {1, 2, …, N}: the index set for the elements in the population. Recall the assumptions we made previously, this implies that U is just the sampling frame.
  3. n: sample size. N = n iff its the census.
  4. D: sample index set

Put the above together, we say a simple random sample with size n is sample index set D from U.

Next, I will introduce two types of SRS:

  1. SRS with replacement: Return the unit after drawing, repeat n time.
  2. SRS without replacement: Draw n distinct units for a sample size of n.

Clearly, we are interested to do SRS without replacement, so I will only discuss this.

Since every unit is equally likely to be selected, we consider the following definition

  • Unit Inclusion Probability, {\pi}_i
    {\pi}_i = \mathrm{P}(\mathrm{unit~}i~\mathrm{in~sample}) = \frac{n}{N}
  • The probability of selecting a sample index D of n units is \frac{1}{^N \!C_n} = \frac{n!(N-n)!)}{N!}. We note that we have ^N \!C _n possible samples here.

Now our next task is to learn how to use the sample to estimate the population parameters, i.e, population mean, proportion, total, variance, standard deviation, coefficient of variation.

  1. Estimating Population Mean, \bar{y_u}
    Estimator, $latex, \bar{y}=\frac{1}{n}\sum_{i \in D} y_i$
    You should notice that this is the same as the sample mean, \bar{y_u} = \frac{1}{n} \sum_{i \in D} y_i
    Loosely speaking, the mean of a mean, is the mean itself. 🙂
  2. Estimating Population proportion, p
    Estimator, \hat{p} = \frac{1}{n} \sum_{i \in D} y_i
  3. Estimating Population total, t
    Estimator, \hat{t}=N \bar{y} = \frac{N}{n} \sum_{i \in D} y_i = \sum_{i \in D} w_i y_i
    Here, w_i = \frac{N}{n} = \frac{1}{\pi _i}, which measures how many units are represented by the sampled unit since \pi_i is the inclusion probability.
  4. Estimating Population variance, S^2
    Estimator, s^2 = \frac{1}{n-1} \sum_{i \in D} (y_i - \bar{y})^2

After finding these estimators, we need to assess the quality of our estimates. We want our estimators to have the following properties:

  • Low or no bias and high precision
    • Low bias or no biased (unbiased): Expectation (mean) of all estimates across samples is close or equal to the population parameter
    • High precision (low variance): Variations in estimates across samples is small

So you might notice I used the phrase “estimates across samples”. Recall that we assume a finite population, so we can write down all possible samples of size n with respective probability. Here, our estimator can be described by a discrete probability distribution, which gives us a sampling distribution for our estimator. The sampling distribution is used to assess the quality (mentioned above) of the estimator. Now we look at how to determine them quantitatively. Note here that \theta refers to the population parameter for convenience.

  1. Bias: The expectation of the estimator \hat{\theta}, \mathbb{E}(\hat{\theta}) is the mean of the sampling distribution of \hat{\theta}.
    Estimation bias: Bias(\hat{\theta}) = \mathbb{E}(\hat{\theta}) - \theta
    If Bias(\hat{\theta}) = 0, the estimator is unbiased.
  2. Variance
    Var(\hat{\theta}) = \mathbb{E}(\hat{\theta}-\mathbb{E}(\hat{\theta}))^2
    Clearly, we hope that the value here is small.
  3. Mean Squared Error (MSE)
    MSE(\hat{\theta}) = \mathbb{E} (\hat{\theta} - \theta)^2 = Var(\hat{\theta}) + (Bias(\hat{\theta}))^2

Notice that MSE is a takes into account both the variance and the bias for calculation.

Qualities of Estimator
Qualities of Estimator

Qualitatively, unbiased (no bias) means that the average position of all arrows is at the bull’s eye. But may have big variance. Precise (small variance) means all arrows are close together, but may be away from the bull’s eye. Accurate (small MSE) mean all arrows are close together (small variance) and near the centre of the target (small bias). We thus have the following conclusions:

An estimator \hat{\theta} of \theta is unbiased if \mathbb{E}(\hat{\theta})=\theta, precise if Var(\hat{\theta}) = \mathbb{E}(\hat{\theta}-\mathbb{E}(\hat{\theta}))^2 is small, and accurate if MSE(\hat{\theta}) = \mathbb{E}(\hat{\theta}-theta)^2 is small.

Note that a badly biased estimator may be precise, but it will not be accurate, as accuracy (MSE) is how close the estimate is to the true value, while precision (variance) is how close estimates from different samples are to each other.

We will look at how to investigate these qualities next.


 

Sampling & Survey #1 – Introduction
Sampling & Survey #2 – Simple Probability Samples
Sampling & Survey #3 – Simple Random Sampling
Sampling & Survey #4 – Qualities of estimator in SRS
Sampling & Survey #5 – Sampling weight, Confidence Interval and sample size in SRS
Sampling & Survey #6 – Systematic Sampling
Sampling & Survey #7 – Stratified Sampling
Sampling & Survey # 8 – Ratio Estimation
Sampling & Survey # 9 – Regression Estimation
Sampling & Survey #10 – Cluster Sampling
Sampling & Survey #11 – Two – Stage Cluster Sampling
Sampling & Survey #12 – Sampling with unequal probabilities (Part 1)
Sampling & Survey #13 – Sampling with unequal probabilities (Part 2)
Sampling & Survey #14 – Nonresponse

Leave a Reply