Sampling & Survey #10 – Cluster Sampling

University Mathematics

So today we discuss about Cluster Sampling. We will begin by looking at cluster sampling with equal probabilities. Firstly, a cluster is a grouping of the members of the population. Previously, we assumed that we can directly sample the elements of the population and these sampling units are the elements of the population. However, this sampling frame list of elements might not be easily obtained due to cost constraints or limitations such as access to information.
An example: If we want to study residents of Singapore (Population), the element is the person. However, we may not be able to contact them directly, and can only find information of the household (cluster).

  • In cluster sampling, the clusters are also called primary sampling units while the elements are called secondary sampling units.
  • In cluster sampling, the sampling units are the clusters and the elements observed are the secondary sampling units within the clusters. Previously, we have treated  our primary sampling units as secondary sampling units.

Why is there a need to do cluster sampling?
Firstly, constructing a sampling frame list of observations units may be difficult, expensive or impossible.
Secondly, the population may be widely distributed geographically or may occur in natural clusters and it may be more cost efficient to take a sampling of clusters rather than an SRS of individuals.

As before, we have a lot of notations to follow first. And before that, I’ll illustrate a scenario here first. Suppose that our population consists of clusters of elements.
Cluster 1: y_{1}1, \ldots , y_1 M_1 Subtotal t_1 = \sum_{j=1}^{M_1} y_{1j}
Cluster 2: y_{2}1, \ldots , y_2 M_2 Subtotal t_2 = \sum_{j=1}^{M_2} y_{2j}

Cluster 1: y_{N}1, \ldots , y_N M_N Subtotal t_1 = \sum_{j=1}^{M_N} y_{Nj}
y_{ij} is the measurement for the j^{th} element in the i^{th} cluster.
N = number of clusters in the population
M_i = number of elements in the cluster
M_0 = \sum_{i=1}^N M_i is the number of elements in the population
\bar{M} = \frac{M_0}{N} is the average cluster size for the population
t_i = \sum_{j=1}^{M_1} y_{ij} is the subtotal for the i^{th} cluster
t = \sum_{i=1}^N t_i = \sum_{i=1}^N \sum_{j=1}^{M_i} y_{ij} is the population total
S_t^2 = \frac{1}{N-1} \sum_{i=1}^N (t_i - \frac{t}{N})^2 is the population variance of the cluster totals.
S^2 = \frac{1}{M_0 - 1} \sum_{i=1}^N \sum_{j=1}^{M _i} (y_{ij} - \bar{y_u})^2 is the population variance per element
S_i^2 = \frac{1}{M_i - 1} \sum_{j=1}^{M_i} (y_{ij} - \bar{y_{iu}})^2 is the population variance within cluster i

Finally, we are done with the notations and can finally ask ourselves how do we draw a cluster sample? We will study two approaches here, one-stage cluster sampling and two-stage cluster sampling. We will do the former now, and the latter in the next post.

One-Stage cluster sampling means that every element within a sampled cluster is included in the sample. Here, the population M_0 elements are divided in N clusters of size.



Sampling & Survey #1 – Introduction
Sampling & Survey #2 – Simple Probability Samples
Sampling & Survey #3 – Simple Random Sampling
Sampling & Survey #4 – Qualities of estimator in SRS
Sampling & Survey #5 – Sampling weight, Confidence Interval and sample size in SRS
Sampling & Survey #6 – Systematic Sampling
Sampling & Survey #7 – Stratified Sampling
Sampling & Survey # 8 – Ratio Estimation
Sampling & Survey # 9 – Regression Estimation
Sampling & Survey #10 – Cluster Sampling
Sampling & Survey #11 – Two – Stage Cluster Sampling
Sampling & Survey #12 – Sampling with unequal probabilities (Part 1)
Sampling & Survey #13 – Sampling with unequal probabilities (Part 2)
Sampling & Survey #14 – Nonresponse

Quick Summary (Probability)

Quick Summary (Probability)

JC Mathematics, Mathematics, University Mathematics

University is starting for some students who took A’levels in 2016. And, one of my ex-students told me to share/ summarise the things to know for probability at University level. Hopefully this helps. H2 Further Mathematics Students will find some of these helpful.

Random Variables

Suppose X is a random variable which can takes values x \in \chi.

X is a discrete r.v. is \chi is countable.
\Rightarrow p(x) is the probability of a value of x and is called the probability mass function.

X is a continuous r.v. is \chi is uncountable.
\Rightarrow f(x) is the probability density function and can be thought of as the probability of a value x.

Probability Mass Function

For a discrete r.v. the probability mass function (PMF) is

p(a) = P(X=a), where a \in \mathbb{R}.

Probability Density Function

If B = (a, b)

P(X \in B) = P(a \le X \le b) = \int_a^b f(x) ~dx.

And strictly speaking,

P(X = a) = \int_a^a f(x) ~dx = 0.


f(a) = P(X = a).

Properties of Distributions

For discrete r.v.
p(x) \ge 0 ~ \forall x \in \chi.
\sum_{x \in \chi} p(x) = 1.

For continuous r.v.
f(x) \ge 0 ~ \forall x \in \chi.
\int_{x \in \chi} f(x) ~dx = 1.

Cumulative Distribution Function

For discrete r.v., the Cumulative Distribution Function (CDF) is
F(a) = P(X \le a) = \sum_{x \le a} p(x).

For continuous r.v., the CDF is
F(a) = P(X \le a ) = \int_{- \infty}^a f(x) ~dx.

Expected Value

For a discrete r.v. X, the expected value is
\mathbb{E} (X) = \sum_{x \in \chi} x p(x).

For a continuous r.v. X, the expected value is
\mathbb{E} (X) = \int_{x \in \chi} x f(x) ~dx.

If Y = g(X), then

For a discrete r.v. X,
\mathbb{E} (Y) = \mathbb{E} [g(X)] = \sum_{x \in \chi} g(x) p(x).

For a continuous r.v. X,
\mathbb{E} (Y) = \mathbb{E} [g(X)] = \int_{x \in \chi} g(x) f(x) ~dx.

Properties of Expectation

For random variables X and Y and constants a, b, \in \mathbb{R}, the expected value has the following properties (applicable to both discrete and continuous r.v.s)

\mathbb{E}(aX + b) = a \mathbb{E}(X) + b

\mathbb{E}(X + Y) = \mathbb{E}(X) + \mathbb{E}(Y)

Realisations of X, denoted by x, may be larger or smaller than \mathbb{E}(X),

If you observed many realisations of X, \mathbb{E}(X) is roughly an average of the values you would observe.

\mathbb{E} (aX + b)
= \int_{- \infty}^{\infty} (ax+b)f(x) ~dx
= \int_{- \infty}^{\infty} axf(x) ~dx + \int_{- \infty}^{\infty} bf(x) ~dx
= a \int_{- \infty}^{\infty} xf(x) ~dx + b \int_{- \infty}^{\infty} f(x) ~dx
= a \mathbb{E} (X) + b


Generally speaking, variance is defined as

Var(X) = \mathbb{E}[(X- \mathbb{E}(X)^2] = \mathbb{E}[X^2] - \mathbb{E}[X]^2

If X is discrete:

Var(X) = \sum_{x \in \chi} ( x - \mathbb{E}[X])^2 p(x)

If X is continuous:

Var(X) = \int_{x \in \chi} ( x - \mathbb{E}[X])^2 f(x) ~dx

Using the properties of expectations, we can show Var(X) = \mathbb{E}(X^2) - \mathbb{E}(X)^2.

= \mathbb{E} [(X - \mathbb{E}[X])^2]
= \mathbb{E} [(X^2 - 2X \mathbb{E}[X]) + \mathbb{E}[X]^2]
= \mathbb{E}[X^2] - 2\mathbb{E}[X]\mathbb{E}[X] + \mathbb{E}[X]^2
= \mathbb{E}[X^2] - \mathbb{E}[X]^2

Standard Deviation

The standard deviation is defined as

std(X) = \sqrt{Var(X)}


For two random variables X and Y, the covariance is generally defined as

Cov(X, Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]

Note that Cov(X, X) = Var(X)

Cov(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y]

Properties of Variance

Given random variables X and Y, and constants a, b, c \in \mathbb{R},

Var(aX \pm bY \pm b ) = a^2 Var(X) + b^2 Var(Y) + 2ab Cov(X, Y)

This proof for the above can be done using definitions of expectations and variance.

Properties of Covariance

Given random variables W, X, Y and Z and constants a, b, \in \mathbb{R}

Cov(X, a) = 0

Cov(aX, bY) = ab Cov(X, Y)

Cov(W+X, Y+Z) = Cov(W, Y) + Cov(W, Z) + Cov(X, Y) + Cov(X, Z)


Correlation is defined as

Corr(X, Y) = \dfrac{Cov(X, Y)}{Std(X) Std(Y)}

It is clear the -1 \le Corr(X, Y) \le 1.

The properties of correlations of sums of random variables follow from those of covariance and standard deviations above.

Post-Results 2016

Post-Results 2016

Chemistry, JC Chemistry, JC General Paper, JC Mathematics, JC Physics, Mathematics, Studying Tips, University Mathematics

Let’s face it. Some of us will not get the dream results we want. Don’t give up and let fear conquer you.

For students unsure of the available courses, they can check out the following post. It contains the grade profile for local universities.

Our Team will be here if you need help/ advice. Feel free to text us.

P.S. Today, I saw an image shared by Mr Wee, which said that “You’re the architect of your own life”. So let’s not let the grades define us.

A little history of e

A little history of e

JC Mathematics, University Mathematics

Some students remarked on why I actually recognise e, that is, e=2.718281828.... Well, e is a rather unique constants. Firstly, for all JC students, we see it our daily algebra & complex numbers. Students exposed to university statistics will see e appearing in the formula for normal distribution, that is, f(x | \mu , \sigma^2) = \frac{1}{\sqrt{2 \sigma^2 \pi}} e^{-\frac{(x-\mu)^2}{2 \sigma^2}}.

Secondly, the story of how it came about is pretty cool as you will observed in the video below.

The Story of e

Hopefully it provides you with another perspective towards this constants! And now you should be more cautious when signing up savings plans that give interest per annum or per month.

P.S. I once confused a banker when I asked her about this. 🙂

The man who knew infinity

JC Mathematics, University Mathematics

Previously, I shared a post on the golden nugget. It relates a bit about the movie The Man Who Knew Infinity. I finally found time to watch it while I’m overseas. Usually in Singapore, I’m too overwhelmed with work and tuitions. So yes, I actually watch movies and gym/ swim when I’m overseas. So the show is really great. Its not a MATH movie so one doesn’t have to understand any math to watch it. It traces the life story of S. Ramanujan, who if you watched the video on the golden nugget, is a seriously good mathematician. The whole movie is really exciting and tells a good story of the sacrifices that mathematicians make. And I do hold great respects for anyone in the field of Pure Mathematics. The things they do really extends to solve many real-world problem today. And if you’re interested in prime numbers a bit, feel free to read here.

Another great math-related movie recently will be the Imitation Game, which on Alan Turing. I watched it with some of my closer students and most of them find it non-mathematical.

Yes, this is another “motivating Mathematics” post.

Golden Nugget!!!

JC Mathematics, University Mathematics

1 + 2 + 3 + 4 + 5 + ... = -\frac{1}{12}

My brother shared this interesting video with me a few days back when he was at the screening for The Man Who Knew Infinity. I’m looking forward to watching this movie too!

Back to the video! It focuses on the sum that is written above. And interestingly, this sum that should not be defined (as what JC students learnt in Arithmetic Progression), is actually a NEGATIVE number. Explain excellently by professor Edward Frenkel. He brings in interesting concepts from complex numbers too. Hopefully, this piques some interest!

Professor Edward Frenkel wrote a book “Love & Math”, which is really intriguing. You don’t have to love math to read it but you will after reading 🙂

You can read more here too.

Introduction to Stochastic Calculus

University Mathematics

Back in undergraduate days, when I took my first module on financial mathematics, my professor introduced us by that the most important things are the following

(\Omega, \mathcal{F}, \mathcal{P})

This is a probability triple where
1. \mathcal{P} is the ‘true’ of physical probability measure
2. \Omega is the universe of possible outcomes.
3. \mathcal{F} is the set of possible events where an event is a subset of \Omega.

There is also a filtration \{\mathcal{F}_t\}_{t \ge 0}, that models the evolution of information through time. For example, if by time t, we know that event \mathcal{E} has occurs, then \mathcal{E} \in \mathcal{F}_t. In the case of a finite horizon from [0,T], then \mathcal{F} = \mathcal{F}_T

A stochastic process X_t is \mathcal{F}_t-adapted if the value of X_t is know at time t when the information represented by \mathcal{F}_t is known. Most of the times, we have sufficient information at present.

In the continuous-time model, \{\mathcal{F}_t\}_{t \ge 0} will be the filtration generated by the stochastic processes (usually a brownian motion, W_t), based on the model’s specification.

Next, we review some martingales and brownian motion, alongside with quadratic variation here.

Question of the Day #17

JC Mathematics, Secondary Math, University Mathematics

This is a pretty cool and interesting question from AMC.


There are four lifts in a building. Each makes three stops, which do not have to be on consecutive floors on include the ground floor. For any two floors, there is at least one lift which stops on both of them. What is the maximum number of floors that this building can have?

(A) 4
(B) 5
(C) 6
(D) 7
(E) 12