Probability and Statistical Data Analysis

Project 2


The primary objective of this study is to analyse the fuel consumption by the car and the factors that influences the fuel consumption. This is because, this study will be able to identify which factor has the greatest influence on the fuel consumption of vehicles. There were several factors considered to conduct the statistical analysis such as distance travelled (km), speed of the vehicle (km/h) and also the fuel type (SP98 or E10 gas). The dataset used in this analysis is obtained from Kaggle which was collected by an American car driver named Andreas Wagener. The method used by him to collect the data is by writing down the data of his car’s display after each ride in regular basis. There were several statistical analysis was carried out including 2-sample hypothesis testing, Correlation, Regression and also Chi Square Test of Independence.


Presentation Video

My Project Timeline

My Project Timeline Plan

Tags: psda

Completely learn all the different statistical analysis using R

Work on the statistical analysis using different methods in R Studio. The dataset should be cleaned to remove any possible outliers

Complete the project report which will contain all the findings from statistical analysis carried out using the dataset.

Produce a video presentation for the Project 2 and also record your reflections about Project 2 in E-Portfolio



As a conclusion, it is important to understand the factors that could possibly affect the fuel consumption in our car. The statistical analysis conducted helped me to understand the different factors that has a major role in car fuel consumption. By conducting the analysis using several different statistical analysis techniques makes the interpretation of the results easy to understand and reflect on the point the test output trying to hypothesize.

Project Profile/Infographic

Project 2 Reflections RSS

Reflect My Project 2

After completing all the statistical analysis and reporting the findings inside the project report, I started to work on preparing for the video presentation. I have already prepared the content of the presentation beforehand in order to deliver an effective presentation. After that, I begin to record my presentation which was about 8 minutes in length. Doing the reflections in my report and presenting them in the video make me understand the different things that I have learnt by carrying out this project. I was able to reflect on things that I manage to understand better and relate them with real time situations. Furthermore, I was also able to interpret raw data to understand the insights it is trying to deliver.


Project in Progress

Then, after submitting the project proposal, I began to work on more detailed things about the project. I start to analyse the data collected slowly and also trying to remove all those outliers, unwanted or invalid data. Te statistical analysis using R Studio was quite challenging for me since I had to learn them by my own and implement the analysis right away. However, the lecture notes provided on R Tutorials was very helpful in understanding R. Then, I analysed the data using different statistical analysis methods. I did also managed to analyse the relationship between different factors that affects the car fuel consumption. After completing my statistical analysis on R Studio, I began reporting my findings inside the Project 2 report. I tried to relate my findings with course content and the concepts I have learnt in the PSDA Course throughout the semester.

Project Kick-off

PSDA Lecturer informed us about the Project 2 which be assessed as the replacement for final examination of PSDA course in this semester. The total timeline of the project will be for a month and it involves all the concepts that have been learnt throughout the semester in this PSDA Course. Then, we were required to submit a initial project proposal for the Project 2. This proposal document will consists of all our plannings about executing Project 2 and also the dataset that will be used for the Statistical Analysis. Then, based on the suggested reference pages by lecturer, I began to look our for possible case study. After going through several possible topics for 2 days, I finally decided on one dataset that I obtained from Kaggle. This dataset is about the car fuel consumption in relation to the factors that influences it which was collected by an American car driver. While working on the Project Proposal, I started to plan more detailed about my case study and the things that I have to consider in my project. As a start, I began to learn R Studio on how to carry out statistical analysis such as 2-sample hypothesis testing, Correlation, Regression and also Chi-Square Test.
