PSDA Project 2

The title for my Project 2 is A Study on Death in Malaysia 2018. The secondary data used is collected from the Department of Statistic Malaysia entitled is “Statistic on causes of death” for the year 2018. The sample size is collected from the number of death in 8 states in Malaysia while the population is the citizen in Malaysia. This topic is interesting to carry out few tests on certain claim. For instance, is the mean death of the man and mean death of the female is different, is the classification of death which is classified by the causes of death, and the states are independent, is there one of the probabilities of the types of death is different to other with all the probabilities are equals, is the relationship between the age and the number of deaths have positive linear relationship, is the relationship between age and number of death having linear regression or not.

In this project, I have learn a lot of methods of the test to be carried out on certain claim. It is very useful to use when I am dealing with a huge number of population. I can just performing those tests on the sample data collected to estimate for the population parameters. It will be more accurate when the number of sample data are greater than 30 because the standard error will decrease.

From the results, there are some claims that are rejected or insufficient evidence to support those claim. Hence, the result of supporting or not supporting of the claim is by referring to the test statistic as well as the critical value. When the test statistic calculated is lies in the critical region means it is rejecting the null hypothesis, however, when it does not lie in the critical region, it fails to reject null hypothesis. For some of the tests, it needs the degree of freedom to use for chi-square value and t-value. The degree of freedom takes a vital role because with the wrong degree of freedom, it will affect the value and makes the results to be not accurate. Thus, in my project, when calculation for the degree of freedom, I need to be very sure that I used the correct parameters in the formula.

Furthermore, as you can see, the number of sample used in the tests, some are different because the way I look and carry out the test is different. For instance, in hypothesis testing, the number of sample size is 8 because the data collected is from the 8 states as mentioned in the report. However, for the rest of the tests, the number of sample is 76704 which is the sample size from Malaysia.

For the correlation and regression analysis, I found out that if the conclusion that I get is positive relationship between both variables, how I can show more evidences or validate the conclusion is that, I calculate the correlation coefficient as well as the coefficient of determination. Both of them support the conclusion and hence, the results can be trusted. Below are the graphs for correlation and regression respectively.

                      c.png                   r.png

When cope with Rstudio, I just refer to the tutorial slides given by our lecturer Dr. Chan Weng Howe. The tutorial slides are very useful as it helps me and save my time when doing the coding. However, in the regression part, the tutorial slide does not provide enough information for the names of parameters in the console. The parameter that showed in r console are complicated for me and in the tutorial slide, it does not mention the representation of parameters in the r console. So, I discover them by referring to tutorial in YouTube. Lastly, for my presentation, I used PowerPoint to record the slide, make some animation to let my video more interesting and guide the viewer to look at what I am presenting. After done my presentation, I export it to mp4 so that our lecturer can refer easily.

In conclusion, the results from those tests for the claims, we can conclude that the mean number of death for male and female are the same, states in Malaysia and the classification of death are dependent, each types of death are not having the same proportion to be happened and the age and the number of death are having a strong positive relationship.