### Sampling & Survey #10 – Cluster Sampling

So today we discuss about Cluster Sampling. We will begin by looking at cluster sampling with equal probabilities. Firstly, a cluster is a grouping of the members of the population. Previously, we assumed that we can directly sample the elements of the population and these sampling units are the elements of the population. However, this sampling frame list of elements might not be easily obtained due to cost constraints or limitations such as access to information.
An example: If we want to study residents of Singapore (Population), the element is the person. However, we may not be able to contact them directly, and can only find information of the household (cluster).

• In cluster sampling, the clusters are also called primary sampling units while the elements are called secondary sampling units.
• In cluster sampling, the sampling units are the clusters and the elements observed are the secondary sampling units within the clusters. Previously, we have treated  our primary sampling units as secondary sampling units.

Why is there a need to do cluster sampling?
Firstly, constructing a sampling frame list of observations units may be difficult, expensive or impossible.
Secondly, the population may be widely distributed geographically or may occur in natural clusters and it may be more cost efficient to take a sampling of clusters rather than an SRS of individuals.

As before, we have a lot of notations to follow first. And before that, I’ll illustrate a scenario here first. Suppose that our population consists of clusters of elements.
Cluster 1: $y_{1}1, \ldots , y_1 M_1$ Subtotal $t_1 = \sum_{j=1}^{M_1} y_{1j}$
Cluster 2: $y_{2}1, \ldots , y_2 M_2$ Subtotal $t_2 = \sum_{j=1}^{M_2} y_{2j}$

Cluster 1: $y_{N}1, \ldots , y_N M_N$ Subtotal $t_1 = \sum_{j=1}^{M_N} y_{Nj}$
$y_{ij}$ is the measurement for the $j^{th}$ element in the $i^{th}$ cluster.
N = number of clusters in the population
$M_i$ = number of elements in the cluster
$M_0 = \sum_{i=1}^N M_i$ is the number of elements in the population
$\bar{M} = \frac{M_0}{N}$ is the average cluster size for the population
$t_i = \sum_{j=1}^{M_1} y_{ij}$ is the subtotal for the $i^{th}$ cluster
$t = \sum_{i=1}^N t_i = \sum_{i=1}^N \sum_{j=1}^{M_i} y_{ij}$ is the population total
$S_t^2 = \frac{1}{N-1} \sum_{i=1}^N (t_i - \frac{t}{N})^2$ is the population variance of the cluster totals.
$S^2 = \frac{1}{M_0 - 1} \sum_{i=1}^N \sum_{j=1}^{M _i} (y_{ij} - \bar{y_u})^2$ is the population variance per element
$S_i^2 = \frac{1}{M_i - 1} \sum_{j=1}^{M_i} (y_{ij} - \bar{y_{iu}})^2$ is the population variance within cluster i

Finally, we are done with the notations and can finally ask ourselves how do we draw a cluster sample? We will study two approaches here, one-stage cluster sampling and two-stage cluster sampling. We will do the former now, and the latter in the next post.

One-Stage cluster sampling means that every element within a sampled cluster is included in the sample. Here, the population $M_0$ elements are divided in N clusters of size.

TBC…

Sampling & Survey #1 – Introduction
Sampling & Survey #2 – Simple Probability Samples
Sampling & Survey #3 – Simple Random Sampling
Sampling & Survey #4 – Qualities of estimator in SRS
Sampling & Survey #5 – Sampling weight, Confidence Interval and sample size in SRS
Sampling & Survey #6 – Systematic Sampling
Sampling & Survey #7 – Stratified Sampling
Sampling & Survey # 8 – Ratio Estimation
Sampling & Survey # 9 – Regression Estimation
Sampling & Survey #10 – Cluster Sampling
Sampling & Survey #11 – Two – Stage Cluster Sampling
Sampling & Survey #12 – Sampling with unequal probabilities (Part 1)
Sampling & Survey #13 – Sampling with unequal probabilities (Part 2)
Sampling & Survey #14 – Nonresponse

### 2018 A-level H2 Mathematics (9758) Paper 1 Suggested Solutions

All solutions here are SUGGESTED. Mr. Teng will hold no liability for any errors. Comments are entirely personal opinions.

Numerical Answers (click the questions for workings/explanation)

Question 1: $\frac{1}{x^2} (1 - \text{ln}x)$; $1 - \frac{2}{e}$
Question 2: $x= \frac{1}{2}$ or $x=3$; $\frac{125}{6}\pi \text{units}^3$
Question 3: $y = 3 -x^2$
Question 4: $x = - 1 \pm \sqrt{3}, 0, -1$; $- 1 - \sqrt{3} \textless x \textless -1$ or $0 \textless x \textless -1 + \sqrt{3}$
*Question 5: $b = -1$, $f^{-1}(x) = 1 + \frac{a+1}{x-1}, x \in \mathbb{R}, x \neq 1, a \neq -1$
Question 6: $\lambda = \pm 2 \sqrt{31}$
Question 7: $N(-\frac{1}{17}, 0)$
Question 8: $A = 5, u_3 = 40$; $a =7.5, b =-5, c = -5$; $15(2^n - 1) - 7.5n - 2.5n^2$
Question 9: $k= 2$; 8 units
Question 10: $I = \frac{3A}{2e}$
Question 11: $102.43;$1215.71; first day of June 2018; $b = \frac{4}{3}; b = 1.23$

*Do note that question 5 is problematic as the domain of $ff$ needs to be equal to the domain of $g$. I do not know the consequences of a faulty question.

MF26

### 2018 A-level H2 Mathematics (9758) Paper 2 Suggested Solutions

All solutions here are SUGGESTED. Mr. Teng will hold no liability for any errors. Comments are entirely personal opinions.

Numerical Answers (click the questions for workings/explanation)

Question 1: $f(x) = 3 (\frac{2}{9}x + 4)^{3/2} + 45, (54, 237)$
Question 2: $s = 69, t = 13$, other roots $= 2 + 3i, 0.5$, $w = - \frac{3}{2} \pm \frac{3\sqrt{3}}{2}i, w_1 w_2 w_3= 27, w_1 + w_2 + w_3= 0$
Question 3: $D(-5, -4, 3), 4x + 45y + 20z = 200, 58.6^{\circ}, 6.88 \text{units}$
Question 4: $\text{ln}(\text{cos}2x) = -2x - \frac{4}{3}x^4 - \frac{65}{45}x^6, x \neq \frac{\pi}{4}, -1.0644, -1.0670$
Question 5: $k \ge 9430000$
Question 6: $\frac{5}{9} \textless p \textless \frac{2}{3}, 0.43046721$
Question 7: $\text{P}(A' \cap B') = 1 - a - b + ab, \text{P}(A' \cap C') = 1 - a - c, \frac{2}{15} \le \text{P}(A \cap B) \le\frac{1}{3}$
Question 8: $\text{P}(S=10)=0; g(n) = 22n^2 + 78n +36$
Question 9: $r = 0.969, r = 0.993; P = 2.85 \times 10^{-8} R^2 - 0.283, R = 6450 \text{~revolutions~per~min}, P = 0.0273 \text{~watts}, P =1.02 \times 10^{-4} R^2 - 0.283$
Question 10: $0.605, 0.773, 0.126, 136, 0.185$

MF26

### Modal value & Expected value

Let us look at the difference between modal value and expected value. We shall start by saying they are different, albeit close.

Modal value refers to the mode, that is, the value that has the highest probability (chance) of occurring.

Expected value refers to the value, we expect to have, on average.

Before we start, I’ll do a fast recap on Binomial Distribution, $X \sim \text{B}(n, p)$ by flashing the formulae that we can find on MF26.

$\text{P}(X = x) = ^n C_x (p)^x (1-p)^{n-x}$

$\mathbb{E}(X) = np$

$\text{Var}(X) = np(1-p)$

The expected value is simply given by $\mathbb{E}(X)$.

Now to find the modal value, we have to go through a slightly nasty and long working. You may click and find out.

We have that $\frac{\text{P}(X = r + 1)}{\text{P}(X = r)} = \frac{(n-r)}{(r+1)} \frac{p}{1-p}$. This is what we call the recurrence formula. We consider this to give us the ratio between successive probabilities. And to illustrate how this works, nothing beats an example question.

Consider candies are packed in packets of 20. On average the proportion of candies that are blue-colored is $p$. It is know that the most common number of blue-colored candies in a packet is 6. Use this information to find exactly the range of values that $p$ can take.

First, most common number is the same as saying the modal/ highest frequency.

This means that $\text{P}(X=6)$ is the highest/ largest probability… Let us turn our attention to the recurrence formula now. If $\text{P}(X=6)$ is the largest, then it means that $\text{P}(X=6) \textgreater \text{P}(X=7)$ and also $\text{P}(X=6) \textgreater \text{P}(X=5)$.

Lets start by looking at the first one… $\text{P}(X=6) \textgreater \text{P}(X=7)$

$\text{P}(X=6) \textgreater \text{P}(X=7)$

$1 > \frac{\text{P}(X=7)}{\text{P}(X=6)}$

$\frac{\text{P}(X=7)}{\text{P}(X=6)} \textless 1$

But hold on! This looks like the recurrence formula. (ok, in exams, its either you use the recurrence formula or derive on the spot. Both works!)

Now I’ll advice you try the second one (before clicking on answer) on your own, that is, $\text{P}(X=6) > \text{P}(X=5)$.

Now, if the question simply says that the expected number of blue-colored candies in a packet of 20 is 6. Then

$\mathbb{E}(X) = 6$

$(20)p = 6$

$p = \frac{3}{10}$

We observe that this value actually falls in the range of $p$ we found.

### Differential Equations (Applications)

When Mr. Teng retired on 1 January 2018, he put a sum of $10,000 into a senior citizen fund that has a constant rate of return of 5% at the end of every month. Starting in February 2018, he withdraws$500 at the start of each month for groceries. Denote the amount of money that Mr. Teng has at the time $n$ years by $\x$.

(i) The differential equation relating $x$ and $n$ can be written in the form of $\frac{dx}{dn}= kx+c$. State the values of $k$ and $c$.

(ii) Solve the differential equation and find the amount of money that Mr. Teng has after 15 months.

(iii) In which month will Mr. Teng no longer be able to withdraw the full \$400?

### DRV questions with a twist

I want to share a question that is really old (older than me, actually). It is from the June 1972 paper. As most students know, Maclaurin’s series was tested in last year A’levels Paper 1 as a sum to infinity. And this DRV did the same thing. Here is the question.

A bag contains 6 black balls and 2 white balls. Balls are drawn at random one at a time from the bag without replacement, and a white ball is drawn for the first time at R th draw.

(i) Tabulate the probability distribution for R.

(ii) Show that E( R ) = 3, and find Var( R ).

If each ball is replaced before another is drawn, show that in this case E( R ) = 4.

### Solutions to Review 1

Question 1
(i)
$y = f(x) = \frac{x^2 + 14x + 50}{3(x+7)}$

$3y(x+7) = x^2 + 14x + 50$

$x^2 + (14-3y)x + 50 - 21 y = 0$

$\text{discriminant} \ge 0$

$(14-3y)^2 - 4(1)(50-21y) \ge 0$

$196 - 84y + 9y^2 - 200 + 84y \ge 0$

$9y^2 - 4 \ge 0$

$(3y - 2)(3y + 2) \ge 0$

$y \le - \frac{2}{3} \text{~or~} y \ge \frac{2}{3}$

(ii)
Using long division, we find that

$y = \frac{x^2 + 14x + 50}{3(x+7)} = \frac{x}{3} + \frac{7}{3} + \frac{1}{3(x+7)}$

So the asymptotes are $y = \frac{x}{3} + \frac{7}{3}$ and $x = -7$

Question 2
(i)
$x^2 - 9y^2 + 18y = 18$

$x^2 - 9(y^2 - 2y) = 18$

$x^2 - 9[(y-1)^2 - 1^2] = 18$

$x^2 - 9(y-1)^2 + 9 = 18$

$x^2 - 9(y-1)^2 = 9$

$\frac{x^2}{9} - (y-1)^2 = 1$

This is a hyperbola with centre $(0, 1)$, asymptotes are $y = \pm \frac{x}{3} + 1$, and vertices $(3, 1)$ and $(-3, 1)$.

$y = \frac{1}{x^2} + 1$ is a graph with asymptotes $x = 0$ and $y=1$.

Use GC to plot.

(ii)
$\frac{x^2}{9} - (y-1)^2 = 1$—(1)

$y = \frac{1}{x^2} + 1$ —(2)

Subst (2) to (1),

$\frac{x^2}{9} - (\frac{1}{x^2} + 1 - 1)^2 = 1$

$\frac{x^2}{9} - (\frac{1}{x^2})^2 = 1$

$x^2 - \frac{9}{x^4} = 9$

$x^6 - 9 = 9x^4$

$x^6 - 9x^4 - 9 = 0$

(iii)
From graph, we observe two intersections. Thus, two roots.

Question 3
(ai)
$\sum_{r=1}^n (r+1)(3r-1)$

$= \sum_{r=1}^n (3r^2 + 2r -1)$

$= \sum_{r=1}^n 3r^2 + \sum_{r=1}^n 2r - \sum_{r=1}^n 1$

$= 3 \sum_{r=1}^n r^2 + 2 \sum_{r=1}^n r - \sum_{r=1}^n 1$

$= 3 \frac{n}{6}(n+1)(2n+1) + 2 \frac{n}{2}(1 + n) - n$

$= \frac{n}{2}(n+1)(2n+1) + n(1+n) - n$

$= \frac{n}{2}(n+1)(2n+1) + n^2$

(aii)
$2 \times 4 + 3 \times 10 + 4 \times 16 + ... + 21 \times 118$

$= 2 [2 \times 2 + 3 \times 5 + 4 \times 8 + ... + 21 \times 59]$

$= 2 [(1+1) \times (3 \cdot 1 - 1) + (2+1) \times (3 \cdot 2 -1) + (3+1) \times (3 \cdot 3 -1) + ... + (20+1) \times (3 \cdot 20 -1) ]$

$= 2 \sum_{r=1}^{20} (r+1)(3r-1)$

$= 2 [\frac{n}{2}(n+1)(2n+1) + n^2 ]$

$= n(n+1)(2n+1) + n^2$

$= n(2n^2 + 3n + 1) + n^2$

$= 2n^3 + 4n^2 + n$

(bi)
$\frac{2}{(r-1)(r+1)} = \frac{A}{r-1} - \frac{B}{r+1}$

$2 = A(r+1) - B(r-1)$

Let $r = -1$

$2 = - B(-2) \Rightarrow B = 1$

Let $r = 1$

$2 = A(2) \Rightarrow A = 1$

$\therefore \frac{2}{(r-1)(r+1)} = \frac{1}{r-1} - \frac{1}{r+1}$

(bii)
$\sum_{r=2}^n \frac{1}{(r-1)(r+1)}$

$= \frac{1}{2} \sum_{r=2}^n \frac{2}{(r-1)(r+1)}$

$= \frac{1}{2} \sum_{r=2}^n (\frac{1}{r-1} - \frac{1}{r+1})$

$= \frac{1}{2} [ 1 - \frac{1}{3}$

$+ \frac{1}{2} - \frac{1}{4}$

$+ \frac{1}{3} - \frac{1}{5}$

$...$

$+ \frac{1}{n-3} - \frac{1}{n-1}$

$+ \frac{1}{n-2} - \frac{1}{n}$

$+ \frac{1}{n-1} - \frac{1}{n+1}]$

$= \frac{1}{2} [1 + \frac{1}{2} - \frac{1}{n} - \frac{1}{n+1}]$

$= \frac{1}{2} (\frac{3}{2} - \frac{n+1+n}{n(n+1)})$

$= \frac{3}{4} - \frac{2n+1}{2n(n+1)}$

(biii)
As $n \to \infty$, $\frac{1}{n} \to 0$ and $\frac{1}{n+1} \to 0$, the sum of series tends to $\frac{3}{4}$, a constant. Thus, series is convergent.

(biv)

$\sum_{r=5}^{n+3} \frac{1}{(r-3)(r-1)}$

Replace $r$ by $r + 2$. Then we have

$\sum_{r=3}^{n+1} \frac{1}{(r-1)(r+1)}$

$= \sum_{r=2}^{n+1} \frac{1}{(r-1)(r+1)} - \frac{1}{(2-1)(2+1)}$

$= \frac{3}{4} - \frac{2(n+1)+1}{2(n+1)[(n+1)+1]} - \frac{1}{3}$

$= \frac{5}{12} - \frac{2n+3}{2(n+1)(n+2)}$

### Scatter Diagrams

I was teaching scatter diagram to some of my students the other day. A few of them are a bit confused with correlation and causation. I gave them the typical ice cream and murder rates example, which I shared here when I discussed about the r-value.

Think of correlation like a trend, it simply can be upwards, downwards or no trend. And since we only discuss about LINEAR correlation here, strong and weak simply is with respect to how linear it is, that means how close your scatter points can be close to a line.

Since A’levels, do ask students to draw certain scatter during exams to illustrate correlation. Here is a handy guide.

### All the best for 2017 Release of A’levels Results!

Will like to take this opportunity to wish all students receiving your results tomorrow, all the best! And do not worry too much and feel free to ask for any advice!
Students will find the following useful 🙂

The following are the grade profiles of local universities, NUS and NTU.

NTU IGP

NUS IGP

Take note, the courses treat GP & PW as a “C”.

### Release of Results for 2017 A’levels

Students should note that the release of A’levels 2017 is on 23rd Feb 2018. More details can be found here.