Starbucks' Beverages Analysis

Introduction

Handcrafted drinks are famous these days. It can be proved by the blooming beverages’ industry such as Tealive, Chatime, and Starbucks branching their premises in various place in the nation. It manages to attract customers with their variety of flavoursome drinks even with their slightly higher price compared to local store drinks. Since these drinks have become a part of people’s life, it is important for them to know what they are consumed in term of nutrition because it is commonly known that these type of drinks have high amount of sugar in every serving etc. Given that sugar is one of the sources of calories, addiction to these types of drinks can affect their health and brings diseases. Besides, people should also take note of the other nutrition it contained. So, this project’s objective is to investigate these beverages’ calories content and the ingredients that may contribute to the factor. The sample will be taken from the Starbucks’ Drink Menu to be analysed.

bar-beverage-blur-caffeine.jpg

Starbucks’ Beverage Nutritional Facts

Toward analysing the nutritional facts of handcrafted drinks, the dataset of Starbucks’ Beverages will be used. This data was collected and published by Starbucks on its public domain. The purpose of the dataset is to illustrates the nutritional information for Starbucks’ drink menu items. All nutritional information for drinks is for the 12oz serving size, and consisting drinks from 5 types of beverages, each with different preps. The total sample size is 224.Slide1.JPG

 

Inferential Analysis

The Population Mean for Calories

The method that was used to test the claimed that the population mean for Calories content of the Starbucks’ drinks is equal to 150 is hypothesis testing on one sample with significance level of 0.05. A z-test statistic will used since we want to test the population mean.

Hypothesis Statement:

H0: µ0 = 150         H1: µ1 ≠ 150

After calculating the test statistic and the p-value in R, the results are:

Z=6.34889           P-value= (-1.9510, 1.9510)

Since the z=6.34889 exceed the p-value, we reject the null hypothesis. There is sufficient evidence to support that population mean for Calories content of Starbucks’ drinks is not equal to 150.

Correlation between Calories and Caffeine

In this part, we want to see the correlation between Caffeine and the Total Fat (g) of the beverages. Let x=Calories and Y=Caffeine, and calculate the correlation coefficient and plot the data as Scatter Diagram in R. The analysis is as below:xCalories_yCaffeine.png

Result:
Correlation coefficient=0.0503

The correlation coefficient is close to 0, meaning that it has weaker linear relationship.

Due to the low correlation coefficient, a significance test for correlation is done to check whether these variables have linear relationship at 0.05 significance level.

Hypothesis Statement:

H0: ρ= 0 (no linear correlation)                  

H1: ρ≠ 0 (linear correlation exist)

Results:
T-statistic=0.7807
p-value=(-1.9699, 1.9699)

Since the T-statistic=0.7807 does not fall in the critical region, p-value= (-1.9699, 1.9699), we failed to reject the null hypothesis. There is sufficient evidence to support that the Calories and Caffeine are not correlated to each other.

Regression between Calories and Sugar

Since sugar is one of main source of calories, we want to prove that the number of calories is affected by the sugar amount. Let x=Sugars, and y=Calories. Then, we want to decide if these variables have significant relationship.

Hypothesis Statement:

H0: β0 = 0 (no linear correlation)              

  H1: β1 ≠ 0 (linear correlation exist)

Results:
y-intercept=37.5429
m=4.7426                  
R2=0.8275
p-value=2.2x10-16

xSugars_yCalories.png

Since p-value is smaller than 0.05 significance level, thus we reject the null hypothesis. So, there is sufficient evidence to support that there is linear correlation between sugar and calories. But based on the results, it seems that R2 =0.8275 which is in the range of 0 - 1. This portrays that it has a weaker linear relationship and some but not all the variation in Calories are explained by variation in Sugars.

Dependency of Beverage and Saturated Fat

Now, we want to test whether saturated fat is independent towards beverage at 0.05 significance level.

Hypothesis Statement:

H0: Saturated fat is independent to Beverage       H1: Saturated Fat is related to Beverage

Results:
X2 = 32.19
critical value = 7.8147

Since X2 = 32.19 is larger than p-value=7.8147, we reject the null hypothesis. There is sufficient evidence to support that saturated fat is related to beverage.

 

Reflection

Post-analysis process, some of my assumption are proved to be right while the rest are rejected. It is shown that the population mean for Calories content of Starbucks’ drinks is not equal to 150. Besides, there is sufficient evidence to support that Calories and Caffeine have a weak linear relationship. The result is further proved through T-statistic test which shown that the Calories and Caffeine are not correlated to each other. Although Calories does not relate to Caffeine, it does have a significant relationship with Sugar. From the regression analysis, there is evidence to support that there is linear correlation between Sugar and Calories although the linear relationship is not strong enough. Finally, saturated fat has been proved to relate with beverage after conducting the chi-square test of dependency.  

Putting the analysis aside, I learn a lot from doing this project, from searching the web for suitable datasets to using R to run the analysis. I also learn the significant of doing analysis as it give us more insight about things that we aren't quite understand. It makes our understanding more firm because we are able to prove it statistically.

Details

Presentation Video