Probability & Statistical Data Analysis



Reflection Project 2

This project based on the Cleveland database has been used by the ML researchers that focused to identify the presence of heart disease in the patient. This dataset was collected by 3 Medical Doctors from University Hospital, Zurich in Switzerland with cooperation of Cleveland Clinic Foundation in order to experimenting with the Cleveland database on heart disease patients. This data originally contains 76 attributes but all published experiment refers to a subset of 10 of them. In that case, the main focused of my project is to identify if the presence of heart disease in the patient is exist or not. I am using three main test which is the first one is hypothesis testing related to resting blood pressure of the patients, correlation between maximum heart rate with resting blood pressure, regression on chest pain scale and last but not least the chi-square test between sex and fasting blood sugar.

I am using hypothesis testing for resting blood pressure in order to observe if most of the patients got an optimal blood pressure to ensure whether they have the possibility of the presence of heart disease or not. In order to measure whether age and resting blood pressure has the statistical relationship, I decided to use correlation with Pearson’s product-moment between this bivariate data. Up next, to examine the relationship between age of patients (independent variable) and their maximum heart rate achieved (dependent variable). Last but not least, the chi-square test where I’m going to determine whether there is a significant association between these two variables, sex and fasting blood sugar since both data is nominal and categorical data.

Reflection Project 1

This case study focused on preference of 50 students from School of Computing on transportation that they use on their daily basis. In order to collect the data, we produce a form where all students could share their thoughts and preferences on their usual transport. In order to present all the data, we use all the data presentation so that the audience could roughly see the results without being analyzed yet. One concern about the case study of preferences was the majority of the respondents were freshmen where most upper year didn't get a chance to fill the form since the form spread among the freshmen only.As for that, we could see the results a bit weighed on walking as a preferred transport. It also affect the other results where the cost per week and the duration of waiting for the transport. We can roughly see that walking didn't cost any penny and 0 duration of waiting instead of other transport. This will make sense because as I realized that the distance between the faculty (most preferred spot on UTM) with the dormitory is not too far instead it's just a stone throw away. Thus, walking is the most preferred transportation among the School of Computing students.

R Studios Tutorial
