PSDA Project 2

Data Analysis Outsourcing Company – Data Analysis Services – Infosearch BPO, India

Introduction

In this project, an inference statistical analysis that covers some analysis needs to be conducted. The data set that been chosen by our group is about the world happiness score from the Kaggle website. My team members and I are compulsory to conduct the analysis on hypothesis testing, correlation, regression, while the chi-square test and ANOVA as our optional tests. The purpose of this analysis is to determine which country has the higher happiness score and what are factors affect the country's happiness score the most. In this analysis process, R studio is used to test and analyze the data.

Understanding Probability And Statistics: Statistical Inference For Data Scientists | by Farhad Malik | Towards Data Science

Video Presentation

Report PDF

Details

Reflection

From this group project 2, I have learned some data analysis techniques such as using R programming in R studio to obtain the result and scatter plots. In this project 2, I also know that every hypothesis we make in daily life can be tested and proved using mathematical methods. I have learned the techniques of solving statistical problems using hypothesis 1 sample test, correlation, regression, chi-square and ANOVA. In this project, a data set regarding the world happiness report has been chosen. From the analysis, I found out that the variable economy has the greatest impact on happiness score and followed by variable family. Therefore, I know that people who live in wealthy countries such as the United States will tend to be happier than people in poverty-stricken countries because a higher economy will lead to higher living standards with better public services in education and health care. Meanwhile, in the regression test, I know that the family also affects the happiness score. People with healthy family relationships will have higher happiness scores. From the chi-square test, the development of continents also affects the happiness score, Europe has higher happiness scores due to better development. For the ANOVA, I know that different countries will not have the same mean of happiness score. Hence, countries with better development, economic growth and family relationships will lead to higher happiness scores, while the countries in Africa region have lower happiness scores due to poorer development, economic growth and family relationships. To conclude, most of the variables in the data set are closely related to each other. When one of the variables has been affected, the other variables will also be affected. This can be determined and proved by using the mathematical statistics that I have learned.

Introduction

Back to PSDA page

Back to Main Page

Dataset

R script

Video Presentation

Report PDF

Reflection

Confirm copying