Project Profile
During the beginning of 2020, Malaysia has encountered its biggest challenge ever, which was the Covid-19 outbreak. Covid-19 was caused by a virus called coronavirus which can be fatal to the patients and even if one heals, the side effects from the diseases are also dangerous.
The states of our campus locates, which is Johor, was also listed as one of the high risk zone during the outbreak. At that time, aside from Selangor and Kuala Lumpur, Johor was considered the third highest risk zone. This has inspired me to conduct the statistical analysis on the number of cases between theses 3 states. At the same time, the relationship between states’ populations and their respective number of cases was also analysed.
The dataset for the project was retrieved from the Statistical of Covid-19 by State collected by Department of Statistics of Malaysia. The link is here.
Since the dataset for the Covid-19 was still considered “young”, there was no properly organized data that can be found. Hence, I have collected the statistical data from the source mentioned above and reorganize it according to my needs.
For the statistical analysis, 4 types of hypothesis testing have been conducted, which are
- Testing of 2 sample which are the population mean of the new Covid-19 cases between Johor and Kuala Lumpur from 21 March 2020 to 30 March 2020
- ANOVA (analysis of variance) on significance difference between the population mean number of new Covid-19 cases in Johor, Kuala Lumpur, and Selangor from 21 March 2020 to 30 March 2020.
- Correlation analysis to measure the strength of association between cumulative number of Covid-19 cases and the populations in each states updated on 18 June 2020.
- Regression analysis measure the strength of association between cumulative number of Covid-19 cases and the populations in each states updated on 18 June 2020.
The result of the test are presented in the presentation video here
Reflection
At the beginning of the project 2, I have been wondering what was the main focus on the project. Initially it was very difficult for me to produce a proposal that can be related to the dataset that I have selected because at that time we have not learned about correlation, regression and ANOVA yet. The main focus of my study was still unclear. However, after understanding what those topics were about, I had a better idea to improve the hypothesis testing to be conducted in the project 2.
Throughout the project 2, I have learned the importance of keeping a statistical record. I have encountered several issues when trying to collect the data because some of the data that I originally planned to use was not recorded. At the same time, I have also learned the importance of analysing data because it can give a better insight on the statistics and thus helping one to make a better decision based on the results. As a Computer Science student, I truly feel that learning how to apply the analysis of data is very important as it could help me to build my career better in the future.