SECI2143_Probability & Statistical Data Analysis

Copy

Everything you need to know about data analytics - University of York

This course is designed to introduce some statistical techniques as tools to analyse the data. In the beginning the students will be exposed with various forms of data. The data represented by the different types of variables are derived from different sources; daily and industrial activities. The analysis begins with the data representation visually. The course will also explore some methods of parameter estimation from different distributions. Further data analysis is conducted by introducing the hypothesis testing. Some models are employed to fit groups of data. At the end of course the students should be able to apply some statistical models in analysing data using available software.

Project 1

Project 2

Assignment 1

Assignment 2

Assignment 3

Assignment 4

Reflection_Project 2 RSS

           For the course of Probability and Statistical Data Analysis, we were required to carry out project 2 in group mode. In project 2, we were required to conduct an inference statistical analysis based on a selected dataset. First of all, we have to search for a suitable dataset from any online resources. However, it is difficult for us to search for a suitable dataset that fulfills the requirement to do inference statistical analysis. This is because some datasets might face problems like data missing, the accuracy of data, and so on. Fortunately, we finally found the dataset that is suitable for our project 2 from Kaggle.

           The dataset that we selected is the “Cardiovascular Disease Dataset” which mainly study on features that increase the chances of having cardiovascular disease. Since the dataset is too large, we decided to use the observations of the first 500 patients with cardiovascular disease. In this project 2, we learned how to conduct inference statistical analysis by using RStudio. For example, we conducted hypothesis test of one sample test, correlation test, regression test, and Chi-Square test of independence.

           There is no doubt that RStudio has greatly simplified our arithmetic process instead of manual calculation and it also guarantees the accuracy of the analysis. In this project 2, we learned the various arithmetic function of RStudio to conduct different tests such as pnorm, qnorm, cor.test and so on. These are useful for our academic life and also our future life. We also learned how to select appropriate variables to conduct inference statistical analysis.

           In this project, I’m also trying to improve myself to be an excellent team player. In addition to actively giving my opinions and cooperation, I’m always a good listener to my team members. I would like to thank all of my team members and leader for their cooperation all the time. Hence, we can always achieve effective communication and finally complete project 2.

            Last but not least, I would to express my deepest gratitude to my lecturer, Dr Nor Azizah Ali for her guidance this semester. She always shows her professionalism and patience to us. When we faced problems or questions, she always helps us as she could and explain everything in detail. Thus, we can successfully apply the knowledge in this project 2.

Details