Probability & Statistical Data Analysis


Reflection PSDA Hazeeq –

It is all started when lecturer gave us the project assessment at the beginning of the semester. The project task need to be done by group and the main objective is kind of a statistic data collection and the way we interpret the data in a better form instead of showing the data collection using Microsoft Excel only which is in a “long black and white table-form” data. I and my group members decided to work on “Evening Activities among UTM students”. But then we change to be more sports-focused as majority agree with this.

Next, things that we got started first is data collection. Nowadays, many people are prefer to fill the questionnaire through online survey. In my opinion, we better use Google Form or any online survey platform because we can save many time by not spending on giving the paper to people and wait them to fill it first. Online survey also can save the use of paper as paper is made from tree. I and my team worked together to create and group the suitable questions according to the topic “Levels of Measurement “and use the appropriate language  to make the people feel comfortable and being nice to fill-in our online survey. After making the questionnaire, then we ask the lecturer that our questions suits in our survey or not. What I have learnt is one type of question we can creatively transform it to become other different type of “Level of Measurement” scale. I am interested with the way we manipulate the questions that can produced unique data.

After a few weeks, we got the data in the Google form online survey. I am one of the team member who has made the online survey and I can see the system interpret the data automatically in the “Response” part. But, the data displayed are not customizable and the graph types are not as much as we need such as Box-plot, Steam and Leaf Scatter Plot and many others. So we download the data from Google Form in form of .xlsx files or also known as “Excel”. So then we can import Excel file into R studio that our lecturer recommend for our group to use it as it can produce many styles of graph and the way we interpret the data. For me, the better styles of graph the better conclusion that we can make and people would be easily to understand on what data collection that we are working at to present the statistical data in any kind of purposes.

Next, we move to the process on how to interpret the data using R studio. R studio is a type of free and application and a command-line programming language. Literally, we better learn how to use R studio  through tutorial slide that given by our lecturer rather than watching tutorial on YouTube because there are some conflicts such as when on YouTube tutorial it can 100% work to interpret the data but our group can’t. So this is quite frustrating as most of us loves to learn based on visually instead or reading slide-by-slide. So to code the program is such a simple way the only we need is not to forgot the key-word of codes like plot, hist, barplot and many others. We interested that we can also use the color by typing “col= (desired color)” instead of typing the color code in numeric that are way too complicated to interpret the data and make it colorful.   

The video making part was very simple as we divide the task into 4 people. I was task with the methodology part as I am the one who can remember the parts-flow in our project and understand the R studio parts. This video making part is happened when we are in quarantined at home. So we talk infront of camera about the task that we have divided for each of us. After that one of our group volunteered to compile the video presentation and to make presentation more attractive.

Lastly, I am so excited experiencing in the making of this statistical project assessment. From this, I have learned so much and make me more matured by the time as we need to be nice in our group members whenever I have a problem or not in a group. Now I am able to make a Google form by myself and making the appropriate question using “measuring scale” that will produce desired data at the end, I also learned to import and interpret data from excel to R studio with colorful and attractive graphs using the useful tools in the application. Hopefully, after this I can use this application if I involve in other statistical work team which are in university or at my job places in the future.




Firstly, this happened when lecturer gave us the Project 2 in the mid of semester due to online class during this MCO (Movement Control Order) and this Project 2 will replace the final exam for this subject. The project task need to be done by one person only (individual project) and the main objective is kind of a statistic data collection and the way we interpret the data in a better form instead of showing the data collection using Microsoft Excel. This project is more like doing case study because we need to analyze one dataset and do the R script coding to take it out more information from the dataset for example the confidence interval, reject or fail to reject the null hypothesis and many others. I decided to work on “Factors That Affect the Demand of Android Apps Market”. But then I deeply discovered about the dependence of some variables towards the highly demands of Android Market.


Next, things that we got started first is data collection. I searched on several websites that provide us to analyze the secondary data from the websites. After that I found one dataset that really impressed myself, I use kaggle website to find the data in This dataset is collected and scrapped from Google Play Store , by Lavanya Gupta 2 year ago. The data was inserted CC license and cleaned by Lavanya Gupta one year ago in (Version 6). The data set is ideal for anyone looking especially software developers to practice their exploratory data analysis or get started in predictive models of apps. Actionable insights can be drawn for developers to work on and capture the Android market.


After a few weeks, I’ve done do some research on what topic should I choose and why I need that in dataset. Yes, it is absolutely to do some statistical analysis. Statistical analysis that I am doing is 1 sample test, correlation and regression, goodness fit test and chi square test of independence. I download the dataset and save it as .xlsx files (Worksheet) so then we can import Excel file into R studio that our lecturer recommends for our class to use it as it can produce many styles of graph and the way we interpret the data. But in this project I don’t use many graphs because other than interpret data with graph, R studio can solve the operation of doing sorting classes, calculate the standard deviation, variance with huge number of lists. My dataset has 10841 listed of apps and so far I can handle it carefully. For me, R studio makes things a lot easier and it is the most efficient when you need to handle hundred thousands of lists in the datasets given and R can also do on what data collection that we are working at to present the statistical data in any kind of purposes.

Next, we move to the process on how to interpret the data using R studio. R studio is a type of free and application and a command-line programming language. Literally, we better learn how to use R studio through tutorial slide that given by our lecturer rather than watching tutorial on YouTube because there are some conflicts such as when on YouTube tutorial it can 100% work to interpret the data but our group can’t. So this is quite frustrating as most of us loves to learn based on visually instead or reading slide-by-slide. So to code the program is such a simple way the only we need is not to forgot the key-word of codes like plot, hist, bar plot and many others. We interested that we can also use the color by typing “col= (desired color)” instead of typing the color code in numeric that are way too complicated to interpret the data and make it colorful. But I have some other things that I need to reach out using YouTube tutorial video on “How to install package for describe function in R”. To make it better analysis using R studio, I used some normal formula into the code so it can calculate automatically because R studio doesn’t have more functions to calculate it in shorter way. But so far I think it is already enough to do the analysis.  

The video making part was very simple. So I talk in front of camera about the task that I have done in project report. I also describe my major findings in the report that are related enough to support my title of project “Factors That Affect the Demand of Android Apps Market” After that I compiled the video presentation and to make presentation more attractive.

Lastly, I am so excited experiencing in the making of this statistical project assessment. From this, I have learned so much regarding to finish this project 2 in terms of many aspects. Now I am able to do statistical data analysis by myself and making my own summary from the datasets so that it can be useful for other people to improve their life much better, I also learned to import and interpret data from excel to R studio with colorful and attractive graphs using the useful tools in the application and also using the function and library provided in R studio. Hopefully, after this I can use this application if I involve in other statistical work team which are in university or at my job places in the future.


