PROBABILITY & STATISTICAL DATA ANALYSIS(PSDA)

PSDA

Details

PSDA PROJECT 1 REPORT

PROJECT 1 R SCRIPT

Project 2 report

Project 2 R scrip

Project 2 dataset

Project 2 presentation slide

PSDA Project 1 reflection

    I have learned many things whilst conducting the project 1 for the subject Probability & Statistical Data Analysis (PSDA). This project is conducted in a group of 4 people. The project specification was for us to determine a suitable topic to conduct a survey and accumulate data from the community. My team and I have decided on the topic " Ideal Characteristics Of A Partner"

   There were many challenges that i have faced throughout the project as a member of this team. The first challenge that i faced was on learning a completely new programming langue which was R programming. Since it was a new programming language that i had no prior knowledge about it took me some time to acclimate to it. But thanks to our PSDA lecturer, Dr Chan Weng Howe , I managed to pick up the programming language after some hard work and diligence. Besides that another major problem I faced whilst researching for the project, communication problems were a major obstacle.  At the time being, Malaysia is under the Movement Control Order(MCO) due to the global pandemic due to CORONA Virus(Covid-19). This has caused most of the university students to go back to their own hometown. This in turn made it very difficult to communicate with the other team members about the project. This situation made it hard to complete the project and also made it difficult to hold meetings to discuss the project. But we as a team managed to overcome this problem with the help of online tools such as social media, ZOOM, Discord and etc. 

   We as a team agreed to conduct the gathering of data through a questionnaire with the help of google forms. This was the most effective efficient method for us and the people answering our survey. Once this step was done we compiled all the data we collected from 70 people, and we transferred all the data to and Excel spreadsheet and we cleaned all the data. We then began to plot our various graphs such as dot-plot, histograms, bar-graph and etc using the R Studio.After completing all the graph we began our report explaining the  information we got from the data accordingly. I than went through the entire report formatting and correcting grammatical errors. We then got on to the third component of the project which was the presentation. Due to the current MCO, we were unable to present it in front of the class and have opted to present it through a video. The video was filmed at each group members respective location and was complied and edited to make the final video. Each  member was responsible in presenting a certain part of the report in detail such as methodology , findings and graphs.  

Details

Project Profile

My project 2 is an individual that uses secondary data that i have retrieved from an online free source data provider Kaggle .

The focus of my case study is Students Performance in Exams.

I have conducted 4 inferential Statistical Test :

  • Hypothesis testing
  • Corellation
  • Regression
  • Chi-Square Test

 Hypothesis Testing 

In hypothesis testing, i want to determine weather the Average math score scored by students is =73.66.

From the testing, I have sufficient evidence to believe that the average math score is not equal to 73.66

hypo.PNG

 

Correlation

This test is carried out to identify the relationship between reading score and writing score.

corr.PNG

The result from the test proves that therse is a strong positive relationship between reading and writing scores.

Regression 

  • Dependent variable -Reading score
  • Independent variable- Writing score regree.PNG

The test results shows a linear relationship between reading and writing scores.

The estimated regression model is then calculated using

 ŷ = 5.5995 + 0.9037x

R2 = 0.8722 which show that only 87.22 % of the variation in reading score is explained by the writing score.

 

Chi- Square Test of Independence

chi.PNG

This test was carried out to determine the relationship between gender and test preparedness.

It was found out that test statistic does not fall in the critical region. I can conclude that at a 95% confidence interval there is sufficient evidence to prove the relationship between gender and test preparedness.

Conclusion

  • There is insufficient evidence to support the claim that the average math score scored by students is not equal to 76.66 marks.

  • There is a strong positive relationship between the average writing score and average reading score.
  • The estimated regression model is then produced in which we obtain ŷ = 5.5995 + 0.9037x.
  • There is evidence of a relationship between gender and test preparedness.

Details

PSDA Project 2 reflection

I have learned many important things whilst conducting the individual project (project 2) as part of my course work for the subject Probability & Statistical Data Analysis (PSDA). This is an individual project which was assigned to us instead of our finals due to the COVID-19 pandemic. Since this project is an individual project it took quite some time and effort to understand and interpret the data into meaningful information. The project specification was to use a secondary data acquired from an online source and than with the acquired data to make an inferential statistics. The topic i have choose for this project is " Students Performance in Exams".

 

There were many new concepts that i have learned and implement throughout the entirety of this project. Some concepts that i have described in my project are regression, hypothesis testing, chi square test and correlation. At first, i felt overwhelmed with the task at hand but i managed to tackle each task and in the end i managed to finish the project with ease. This would not have been possible without  the help and guidance from my PSDA lecturer, DR.Chan Weng Howe. There were many hardships and challenges that i have faced whist conducting this project. One of the major problems i faced doing this project was to choose the suitable datasets from the vast numbers of datasets on the internet. I also had a hard time choosing the suitable variable to use for the test that i will be conducting with said datasets. Once that steps was done, the next challenge i faced was plotting the graph using R Studio. It took quite some time to complete all the graphs and calculation using R Studio. The process was made easy for me from the help of YouTube and Stack Overflow to help me plot the graphs. I have choose 40 random samples from my datasets to conduct this experiment. The data set was retried from Kaggle, an online opensource data source. After completing all the graph I began writing the report explaining the  information I got from the data accordingly(inferential statistics). I than went through the entire report formatting correcting grammatical errors. I then got on to the third component of the project which was the presentation. Due to the current MCO, I was unable to present it in front of the class and have opted to present it through a video.

From this project i can conclude a coupe of thing in regard with my topic.

Based on the hypothesis, we fail to reject the null hypothesis. There is insufficient evidence to support the claim that the average maths score scored by students is not equal to 76.66 marks.  Next, from the analysis, it is found that there is a strong positive relationship between the average writing score and average reading score with a correlation coefficient (r) of 0.9339426.The estimated regression model is then produced in which we obtain ŷ = 5.5995 + 0.9037x, and this regression model is helpful in predicting the average reading score based on average writing score. The conclusion which can be drawn from the chi square test is Since the p-value is > 0.05, therefore fail reject the null hypothesis As a conclusion, there is evidence of a relationship between gender and test preparedness.

Details

Introdunction to PSDA

PROJECT 1 Presentation

Project 2 Presentation