KNN Model
Predictions ABV & IBU
<!DOCTYPE html>
Cast Study 01 - Beers and Breweries - Budweiser
Vo Nguyen
2022-10-19
1) How many breweries are present in each state?
##
## AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA
## 7 3 2 11 39 47 8 1 2 15 7 4 5 5 18 22 3 4 5 23
## MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI
## 7 9 32 12 9 2 9 19 1 5 3 3 4 2 16 15 6 29 25 5
## SC SD TN TX UT VA VT WA WI WV WY
## 4 1 3 28 4 16 10 23 20 1 4
2) The first 6 observations and the last 6 observations of merge data of Beers and Breweries
#Change Column Name
colnames(Beers)[5]= "Brew_ID"
#Merge Beer and Breweries together
brewbeer = merge(Beers,Brew, by="Brew_ID")
# First Six Observations
head(brewbeer)
## Brew_ID Name.x Beer_ID ABV IBU Style
## 1 1 Get Together 2692 0.045 50 American IPA
## 2 1 Maggie's Leap 2691 0.049 26 Milk / Sweet Stout
## 3 1 Wall's End 2690 0.048 19 English Brown Ale
## 4 1 Pumpion 2689 0.060 38 Pumpkin Ale
## 5 1 Stronghold 2688 0.060 25 American Porter
## 6 1 Parapet ESB 2687 0.056 47 Extra Special / Strong Bitter (ESB)
## Ounces Name.y City State
## 1 16 NorthGate Brewing Minneapolis MN
## 2 16 NorthGate Brewing Minneapolis MN
## 3 16 NorthGate Brewing Minneapolis MN
## 4 16 NorthGate Brewing Minneapolis MN
## 5 16 NorthGate Brewing Minneapolis MN
## 6 16 NorthGate Brewing Minneapolis MN
# Last Six Observations
tail(brewbeer)
## Brew_ID Name.x Beer_ID ABV IBU
## 2405 556 Pilsner Ukiah 98 0.055 NA
## 2406 557 Heinnieweisse Weissebier 52 0.049 NA
## 2407 557 Snapperhead IPA 51 0.068 NA
## 2408 557 Moo Thunder Stout 50 0.049 NA
## 2409 557 Porkslap Pale Ale 49 0.043 NA
## 2410 558 Urban Wilderness Pale Ale 30 0.049 NA
## Style Ounces Name.y City
## 2405 German Pilsener 12 Ukiah Brewing Company Ukiah
## 2406 Hefeweizen 12 Butternuts Beer and Ale Garrattsville
## 2407 American IPA 12 Butternuts Beer and Ale Garrattsville
## 2408 Milk / Sweet Stout 12 Butternuts Beer and Ale Garrattsville
## 2409 American Pale Ale (APA) 12 Butternuts Beer and Ale Garrattsville
## 2410 English Pale Ale 12 Sleeping Lady Brewing Company Anchorage
## State
## 2405 CA
## 2406 NY
## 2407 NY
## 2408 NY
## 2409 NY
## 2410 AK
3. Address the missing values in each column.
After working with my team and comparing our results, I found that there are about 1000 missing IBU values. The total amount of IBU values are 210. This can significantly interfer with our test. Thus, we are going to impute our missing NAs with the median for ABV and IBU, and run a KNN model.
hello_NA = brewbeer[!complete.cases(brewbeer),]
dim(hello_NA)
## [1] 1005 10
head(hello_NA)
## Brew_ID Name.x Beer_ID ABV IBU Style
## 17 2 Kamen Knuddeln 2676 0.065 NA American Wild Ale
## 35 6 Blackbeard 2657 0.093 NA American Double / Imperial Stout
## 36 6 Rye Knot 2656 0.062 NA American Brown Ale
## 37 6 Dead Arm 2655 0.060 NA American Pale Ale (APA)
## 38 6 32°/50° Kölsch 2654 0.048 NA Kölsch
## 39 6 HopArt 2653 0.077 NA American IPA
## Ounces Name.y City State
## 17 16 Against the Grain Brewery Louisville KY
## 35 12 COAST Brewing Company Charleston SC
## 36 12 COAST Brewing Company Charleston SC
## 37 12 COAST Brewing Company Charleston SC
## 38 16 COAST Brewing Company Charleston SC
## 39 16 COAST Brewing Company Charleston SC
library(naniar)
gg_miss_var(brewbeer)
## Warning: It is deprecated to specify `guide = FALSE` to remove a guide. Please
## use `guide = "none"` instead.
There are over 1000 missing values in IBU, almost 100 in ABV, and 5 in Style. We impute those missing values with median for our KNN prediction model.
4. Compute the median alcohol content and international bitterness unit for each state. Plot a bar chart to compare.
## Warning: Removed 1 rows containing missing values (position_stack).
5. Which state has the maximum alcoholic (ABV) beer? Which state has the most bitter (IBU) beer?
Colorado has a maximum alcoholic beer of 0.128 ABV.
Oregon has the most bitter beer of 138 IBU
## # A tibble: 6 × 2
## State max_alc
## <chr> <dbl>
## 1 " CO" 0.128
## 2 " KY" 0.125
## 3 " IN" 0.12
## 4 " NY" 0.1
## 5 " CA" 0.099
## 6 " ID" 0.099
## # A tibble: 6 × 2
## State max_ibu
## <chr> <dbl>
## 1 " OR" 138
## 2 " VA" 135
## 3 " MA" 130
## 4 " OH" 126
## 5 " MN" 120
## 6 " VT" 120
7. Is there an apparent relationship between the bitterness of the beer and its alcoholic content? Draw a scatter plot. Make your best judgment of a relationship and EXPLAIN your answer.
library(ggplot2)
brewbeer %>% ggplot(aes(x = ABV, y = IBU)) +
geom_point(color = "blue")
## Warning: Removed 1005 rows containing missing values (geom_point).
scatter.smooth(x=brewbeer$ABV, y=brewbeer$IBU, main = "Mild positive linear relationship",xlab = "ABV", ylab = "IBU", col="blue")
The graph above is comparing standarderize ABV and IBU values, and the trend shows a mild positive linear relationship between bitterness and alcoholic content(3rd degree polynomial). Thus, there is an apparent relationship as ABV increases, IBU also increases. If we look at the variances though, there data points that don’t fit this relationship such as the ABV values around .09 and also being around 2 bitterness. However, the the majority of our data sits in the center being with a more positive linear relationship.
8. Budweiser KNN model ABV and IBU to predict beer style
In our scatter plot above, we standardize our continuous variables to proper scale.
## [1] 5
## [1] 0.5462379
k = 5 has a maximum accuracy.
##
## classifications ALE IPA Other
## ALE 154 31 147
## IPA 41 124 43
## Other 66 13 104
## Confusion Matrix and Statistics
##
##
## classifications ALE IPA Other
## ALE 154 31 147
## IPA 41 124 43
## Other 66 13 104
##
## Overall Statistics
##
## Accuracy : 0.5284
## 95% CI : (0.4912, 0.5653)
## No Information Rate : 0.4066
## P-Value [Acc > NIR] : 2.727e-11
##
## Kappa : 0.2902
##
## Mcnemar's Test P-Value : 1.872e-10
##
## Statistics by Class:
##
## Class: ALE Class: IPA Class: Other
## Sensitivity 0.5900 0.7381 0.3537
## Specificity 0.6147 0.8486 0.8159
## Pos Pred Value 0.4639 0.5962 0.5683
## Neg Pred Value 0.7263 0.9146 0.6481
## Prevalence 0.3610 0.2324 0.4066
## Detection Rate 0.2130 0.1715 0.1438
## Detection Prevalence 0.4592 0.2877 0.2531
## Balanced Accuracy 0.6024 0.7934 0.5848
We ran classification with k = 5, and the probabilities are above. Accuracy is 52-54%.
Our KNN Model for Predictions
Now that we have our KNN model running with a maximum accuracy k = 5, we can use take in any values of IBU and ABV to predict the type of alcohol drink.
### Example:
#Input
bitterness = 100
alcohol = 0.089
#scale
scaled_center_bitterness = mean(brewbeerimpute$IBU)
scaled_scale_bitterness = sd(brewbeerimpute$IBU)
scaled_center_alc = mean(brewbeerimpute$ABV)
scaled_scale_alc = sd(brewbeerimpute$ABV)
y=(bitterness-scaled_center_bitterness)/scaled_scale_bitterness
x=(alcohol-scaled_center_alc)/scaled_scale_alc
test1= c(x,y)
knn(train[,c(1,2)],test1,train$Type, prob = TRUE, k = 5)
## [1] IPA
## attr(,"prob")
## [1] 1
## Levels: ALE IPA Other
What is the best, optimized ABV and IBU for the Texas market compared to the best three markets?
From our analysis of the mean of the three top breweries, we see that the best mean ABV and IBU value are .07107 and 54.65313.
Using the best mean ABV and IBU values of .07107 and 54.65313, if we were to make an alcoholic drink in Texas, let’s use our KNN model to see which one would be best.
#Input
bitterness = 54.65313
alcohol = 0.07107107
#scale
scaled_center_bitterness = mean(brewbeerimpute$IBU)
scaled_scale_bitterness = sd(brewbeerimpute$IBU)
scaled_center_alc = mean(brewbeerimpute$ABV)
scaled_scale_alc = sd(brewbeerimpute$ABV)
y=(bitterness-scaled_center_bitterness)/scaled_scale_bitterness
x=(alcohol-scaled_center_alc)/scaled_scale_alc
test1= c(x,y)
knn(train[,c(1,2)],test1,train$Type, prob = TRUE, k = 7)
## [1] IPA
## attr(,"prob")
## [1] 0.5714286
## Levels: ALE IPA Other
6. Comment on the summary statistics and distribution of the ABV variable.
The mean of ABV is 0.5977, where the maximum is 0.128 and the minimum is .001. ABV variable has a mode around .5 and a range about .11 by the summary statistics. The distribution of the histogram is right skewed.