 ## What is the probability of my child entering Nanyang Primary or Ai Tong

Like most parents in Singapore, we leave no chance to getting to the best schools for our children. My wife wants to move closer to Nanyang Primary to increase our chance of getting a place. I did not want to do that. Our chances are high enough that it doesn’t warrant such extremes. I realised I needed to prove my point.

## How does primary school registration work?

For the uninitiated, primary school registration is broken up into phases - similar to a priority queue, where the earlier phases get the places first. Phase 1 is for siblings of current students. Phase 2A is for the alumni. Phase 2B is for parent volunteers, community leaders, anybody related to the school. Finally, Phase 2C is for the public. When there are more applicants than places, citizens are prioritised over foreigners, followed by those living near the school. Details on the official website

I’m lucky. I have two options. As an alumni of Nanyang Primary, I qualify for Phase 2A. As I live near Ai Tong School, I have a chance in Phase 2C. Both schools are wife-approved. What is the probability of my child getting into either schools?

## Probability of getting into Nanyang Primary

The probability is dependent on 2 things: number of applicants and the number of places. The number of places depends on the number of applicants from Phase 1. How many will apply for Phase 1 in 2018?

### Finding the number of places for Phase 2A

If you didn’t know already, the number of people who drowned by falling into a pool correlates to the films Nicholas Cage appeared in. We ought to treat correlation results with care. However, it makes sense that the birth rate should correlate to the number of applicants. I took the birth data from the published government statistics. The enrolment statistics and balloting statistics for Nanyang Primary are published as well.

</td> </td> </td> </td> </td> </td> </td> </td> </td> </td> </tbody> </table> The above is a table of data tabulating the number of resident births and Phase 1 applicants to Nanyang Primary. The [correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) between the applicants and births is below.
Year Residents Births Phase 1 applicants
201635,129196
201536,925205
201437,277191
201337,074195
201236,272187
201135,528199
201035,135179
200935,474176
200838,555180
200739,281180
Correlation Coefficient Value
Phase 1 vs Births -0.053323383
The number of applicants is negatively correlated to the birth rate. Which means when there are more births, there are less applicants. It doesn't make sense. Let's look at the scatter plot. ![Nanyang Primary Phase 1 applicants vs Births](https://s3.amazonaws.com/static.liangzan.net/blog/nyps_phase1_vs_births.png) Applicants in 2018 will come from births in 2012 which is **38,641**. To be conservative, I selected the three points on the top left, drew a line through them to interpolate the number of applicants for 2018. It is **215**. Since the number of places at Nanyang is **390**, that leaves ``` 390 - 215 = 175 ``` **175** places. There is a [law introduced in 2014](http://www.todayonline.com/singapore/moe-40-spaces-reserved-every-primary-school-phase-2b-2c-applicants) that there must be at least 40 places kept for Phase 2B and 2C. In the worst case scenario, that leaves ``` 175 - 40 = 135 ``` **135** places for Phase 2A. We now know the number of places on offer. Let us find the number of applicants next. ## Finding the number of applicants for Phase 2A Likewise, I believe that the number of applicants for Phase 2A _should_ correlate to the number of births.
Year Residents Births Phase 2A1 applicants Phase 2A2 applicants Aggregated Phase 2A applicants
201635,12910519124
201536,9257042112
201437,2776761128
201337,0746763130
201236,2725475129
201135,5284375118
201035,1354184125
200935,4743867105
200838,555395796
200739,281216081
Likewise, we tabulate the number of resident to Phase 2A applicants. The correlation coefficient between the applicants and births is below.
Correlation Coefficient Value
Phase 2A1 vs Births-0.405010861
Phase 2A2 vs Births-0.056426767
Aggregated Phase 2A vs Births-0.650995037
Again we see negative correlation, which does not make sense. It is likely to be skewed by the data from 2007-2008. Let's use the scatter plot instead. ![Nanyang Primary Phase 2A applicants vs Births](https://s3.amazonaws.com/static.liangzan.net/blog/nyps_phase2a_vs_births.png) Ignoring the 2 data points from 2007-2008(at bottom right), I draw a line through to interpolate the number of applicants for Phase 2A. I arrived at **145** applicants. With this figure, we can calculate the probability of my child getting a place in Nanyang Primary in Phase 2A. ``` Probability = (Number of places) / (Number of applicants in Phase 2A) = 135 / 145 = 0.931034483 ``` My daughter has a **93.1%** chance of getting into Nanyang Primary. That's reassuring. ## Backup plan: Ai Tong In the **6.9%** chance that my daughter won't get into Nanyang Primary, what is the probability of her entering Ai Tong? That is dependent on the number of applicants to the number of places. Since the ruling in 2014 ensures that there are at least 20 places for Phase 2C, let us assume the worst and use **20** for the number of places available. ## Finding the number of applicants for Phase 2C Consistent with what we did previously, the number of Phase 2C applicants _should_ correlate to the number of births.
Year Resident Births Phase 2C applicants Places TUR APP Probability
201635,12949.83330.91.510.662251656
201536,92546.223.10.9320.5
201437,27744.5529.70.911.50.666666667
201337,07435.8386.60.985.430.184162063
201236,27258.21223.10.932.520.396825397
201135,52867.3219.80.943.40.294117647
201035,13567.3239.60.881.70.588235294
200935,47465.83549.50.851.330.751879699
200838,55564.54839.60.881.630.613496933
200739,28166.3349.50.851.340.746268657
200636,27279.59629.70.912.680.373134328
There is data published on [Kiasu Parents](https://www.kiasuparents.com/kiasu/article/bishan/). The figures are in percentages. I had to infer the places from the percentages. **TUR** stands for cumulative take up rate, which is the percentage of the places that were taken by Phase 2C. **APP** is the ratio of applicants to the places available. As we are assuming the worst, we are not using that to estimate the number of places. <thead </thead>
Correlation coefficientValue
Phase 2C vs Births-0.090778097
Phase 2C vs Places0.554622244
If we correlate births to the number of applicants, it returns a negative value which does not make sense. If we correlate number of applicants to the places available, it returns a positive correlation of around **0.55**. Parents could be making their decision to apply based on the number of places available. ![Ai Tong Phase 2C applicants vs Places](https://s3.amazonaws.com/static.liangzan.net/blog/aitong_places_vs_phas2c.png) Using the scatter plot, I drew a line to interpolate the number of applicants given **20** places. It is **42**. ``` Probability = (Number of places) / (Number of applicants in Phase 2C) = 20 / 42 = 0.476190476 ``` We conclude that my daughter has a **47.6%** chance of getting into Ai Tong at Phase 2C. ## Bringing everything together I have both the probabilities for getting a place in Nanyang Primary and Ai Tong. What I want to know is the probability of getting into either. Let us first define our terms. ``` A is the probability that we get a place in Nanyang B is the probability that we get a place in Ai Tong C is the probability that we get a place in Ai Tong, given we failed to get a place in Nanyang D is the probability that we failed to get a place in Nanyang ``` To help the reader, I've copied out some probability equations. ``` P(A|B) = P(A n B) / P(B) # conditional probability P(A n B) = P(A) * P(B|A) # when both A and B occurs P(A n B) = P(A) * P(B) # for independent events P(A n B) = 0 # for mutually exclusive events P(A u B) = P(A) + P(B) - P(A n B) ``` We want the result of the probabilities of A and C together. Since A anc C are mutually exclusive, `P(A n C)` is 0. The final result would be the addition of both probabilities. ``` P(A u C) = P(A) + P(C) - P(A n C) = P(A) + P(C) - 0 = P(A) + P(C) ``` We already know `P(A) = 0.931`. So we need to find `P(C)`. Let us define `P(C)` as a conditional probability, since we want the probability of getting into Ai Tong, given that we failed to get a place in Nanyang. ``` P(C) = P(B|D) = P(B n D) / P(D) ``` Does getting a place in Nanyang affect the probability of getting a place in Ai Tong? The number of places don't change as it is set by law. The number of applicants will still be the same as people applying do not care about my result. They are mutually exclusive. Therefore, ``` P(B n D) = 0 ``` Which means ``` P(C) = P(B|D) = P(B n D) / P(D) = 0 ``` That is obviously wrong. Did we do something wrong? Let's redefine C as the probability that we get a place in Ai Tong, and we failed to get a place in Nanyang. ``` P(C) = P(B n D) = P(B) * P(D) ``` We plug the result back to our first equation. ``` P(A u C) = P(A) + P(C) = P(A) + (P(B) * P(D)) = 0.931 + (0.476 * 0.069) = 0.963844 ``` It means that my daughter has an increased chance of **96.3%** of getting into Nanyang or Ai Tong. That's reassuring. With that figure, I can tell my wife not to worry. With this simple episode, you can see the usefulness of Math in helping to make decisions. Intuition can be [misleading](https://en.wikipedia.org/wiki/Birthday_problem). We must demand rigour in thinking when making decisions that has big impact.