So last time we saw STR and here is a quick recap.
- Set the stratification scheme
- Set the stratum design
- Implement the sampling methods for each stratum independently
- Pool the strum estimates to estimate the population parameters
- Estimate their respective variances
- Construct CI, if necessary.
Today, we look at ratio estimation. For starters, we will use SRS only and same as before, we assume there is no non-sampling error, only sampling error.
As usual, we start with the definitions
We introduce two variables which is an auxiliary variable or subsidiary variable, and which is a response variable (characteristic of interest). The idea here is to utilise the auxiliary variable which is correlated to the response variable to improve precision.
Next, we have a new population parameter B (ratio).
where and
And here is the procedure
- We assume is known, is known
- Use SRS and measure and in the sample.
- Calculate and for the sample
We use ratio estimation because at times, our ratio of interest might be average yield in bushels per acre, ratio of fish caught to the number of hours spend, per capita income, etc. And for most of these cases, the population size N is unknown, so its still necessary for us to estimate a population total. Since we cannot use the estimator here, we consider another measure of size, that is, . So we can estimate N by . Consequently, where estimates the total sample size based on the auxiliary variable.
The benefits of using ratio estimation is clear.
- Smaller MSE if x and y are correlated, giving us an increase of precision
- We are able to adjust estimates to reflect known information, and evaluate them more in depth for a more representative result.
- We can adjust for nonresponse.
You might notice that taking a SRS will slight underestimate the true population mean of x‘s, that is, is smaller than . And should x and y be positively correlated, may also underestimate
Ratio estimation for the population mean is given by
Here we correct the underestimation by expanding by a factor
After looking at the estimators, as usual, we questions its qualities.
Firstly, the ratio estimators are biased. This arises because the unbiased is multiplied by . The good news is that our variance is reduced, essentially compensating for the presence of bias. This means that although , the value of for any individual sample is likely to be closer to than the sample mean . Of course, the average deviation , averaged over all possible samples D that could be obtained, is zero.
We introduce a population correlation coefficient of x and y first.
= = = where
Here, notice that as sample size increased, decreases. Ignoring FPC, then
MSE is dominated by the variance. So in large samples,
Let , then is an unbiased estimator for
When n is large (more than 30),
Its is worth asking ourselves when this approximate MSE is small. Rewriting it, we have
.
So approximate MSE is small when
- Sample size n is small
- sampling fraction is large
- Deviations are small
- Correlation between x and y is close to
- is large.
Estimated variance, where = and
When is unknown, we can substitute it by , then
Similarly, if the sample sizes are sufficiently large, approximate 95% CIs can be constructed using the standard errors as
For large samples, the effect of bias in the CIs can be ignored.
A distinct advantage of using ratio estimation is that the iff . This implies that if the coefficient of variation are approximately equal, then it pays to use ratio estimation when the correlation between x and y is larger than
Next time, we will look at Regression Estimation.
Sampling & Survey #1 – Introduction
Sampling & Survey #2 – Simple Probability Samples
Sampling & Survey #3 – Simple Random Sampling
Sampling & Survey #4 – Qualities of estimator in SRS
Sampling & Survey #5 – Sampling weight, Confidence Interval and sample size in SRS
Sampling & Survey #6 – Systematic Sampling
Sampling & Survey #7 – Stratified Sampling
Sampling & Survey # 8 – Ratio Estimation
Sampling & Survey # 9 – Regression Estimation
Sampling & Survey #10 – Cluster Sampling
Sampling & Survey #11 – Two – Stage Cluster Sampling
Sampling & Survey #12 – Sampling with unequal probabilities (Part 1)
Sampling & Survey #13 – Sampling with unequal probabilities (Part 2)
Sampling & Survey #14 – Nonresponse