Naive Bayes

Predict Attrition & Salary

<!DOCTYPE html>

Case Study 2: Attrition and Salary EDA & Analysis

Introduction

YouTube Presentation: https://www.youtube.com/watch?v=3SXL037Iprc RShiny App:https://vochannguyen.shinyapps.io/VoShiny2/

We are working with DDSAnalytics to create a model to predict employee turnover using the employee data. We will performing multiple models to find the best fitting model to identity factors that lead to attrition. We will also identiy the top three factors that contribute to turnover. Additionally, we will predict salary using our given test dataset.

Case Study 2 Analysis Agenda:
1) Explore Graphs and Trends in Data for different possible factors of Attrition for Numerical and Categorical Responses 2) Determine Influential Factors in Attrition
3) Find the Best Model for Attrition - KNN or Naive Bayes 4) Run the Attrition Model using a Test Set 4) Run a Multiple Linear Regression for Salary Prediction using All Predictors and Use the Statistical Significant Predictors

My Top Three Predictors for Attrition
1) Monthly Income
2) Job Level
3) Overtime

Salary Predictors:

Data Overview

  • Our dataset contains numerical and categorical variables. We will be using dummy variables for our categorical variables.
  • We will remove the Over18 data from our analysis because it has one value of “Y”
  • We will remove Standard Hours from our analysis because it contains one value of 80
  • ID is a numerical variable to describe each observation, we are not going to use for our analysis
#Needed Libraries
library(XML)
library(dplyr)
library(tidyr)
library(stringi)
library(ggplot2)
library(class)
library(caret)
library(e1071)
library(stringr)
library(naniar)
library(rmarkdown)
library(readxl)
library(GGally)
#Read Data
employeeData <- read.csv("CaseStudy2-data.csv")
employeeData=employeeData
#Data Overview
str(employeeData)
## 'data.frame':    870 obs. of  36 variables:
##  $ ID                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age                     : int  32 40 35 32 24 27 41 37 34 34 ...
##  $ Attrition               : chr  "No" "No" "No" "No" ...
##  $ BusinessTravel          : chr  "Travel_Rarely" "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" ...
##  $ DailyRate               : int  117 1308 200 801 567 294 1283 309 1333 653 ...
##  $ Department              : chr  "Sales" "Research & Development" "Research & Development" "Sales" ...
##  $ DistanceFromHome        : int  13 14 18 1 2 10 5 10 10 10 ...
##  $ Education               : int  4 3 2 4 1 2 5 4 4 4 ...
##  $ EducationField          : chr  "Life Sciences" "Medical" "Life Sciences" "Marketing" ...
##  $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EmployeeNumber          : int  859 1128 1412 2016 1646 733 1448 1105 1055 1597 ...
##  $ EnvironmentSatisfaction : int  2 3 3 3 1 4 2 4 3 4 ...
##  $ Gender                  : chr  "Male" "Male" "Male" "Female" ...
##  $ HourlyRate              : int  73 44 60 48 32 32 90 88 87 92 ...
##  $ JobInvolvement          : int  3 2 3 3 3 3 4 2 3 2 ...
##  $ JobLevel                : int  2 5 3 3 1 3 1 2 1 2 ...
##  $ JobRole                 : chr  "Sales Executive" "Research Director" "Manufacturing Director" "Sales Executive" ...
##  $ JobSatisfaction         : int  4 3 4 4 4 1 3 4 3 3 ...
##  $ MaritalStatus           : chr  "Divorced" "Single" "Single" "Married" ...
##  $ MonthlyIncome           : int  4403 19626 9362 10422 3760 8793 2127 6694 2220 5063 ...
##  $ MonthlyRate             : int  9250 17544 19944 24032 17218 4809 5561 24223 18410 15332 ...
##  $ NumCompaniesWorked      : int  2 1 2 1 1 1 2 2 1 1 ...
##  $ Over18                  : chr  "Y" "Y" "Y" "Y" ...
##  $ OverTime                : chr  "No" "No" "No" "No" ...
##  $ PercentSalaryHike       : int  11 14 11 19 13 21 12 14 19 14 ...
##  $ PerformanceRating       : int  3 3 3 3 3 4 3 3 3 3 ...
##  $ RelationshipSatisfaction: int  3 1 3 3 3 3 1 3 4 2 ...
##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
##  $ StockOptionLevel        : int  1 0 0 2 0 2 0 3 1 1 ...
##  $ TotalWorkingYears       : int  8 21 10 14 6 9 7 8 1 8 ...
##  $ TrainingTimesLastYear   : int  3 2 2 3 2 4 5 5 2 3 ...
##  $ WorkLifeBalance         : int  2 4 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany          : int  5 20 2 14 6 9 4 1 1 8 ...
##  $ YearsInCurrentRole      : int  2 7 2 10 3 7 2 0 1 2 ...
##  $ YearsSinceLastPromotion : int  0 4 2 5 1 1 0 0 0 7 ...
##  $ YearsWithCurrManager    : int  3 9 2 7 3 7 3 0 0 7 ...

Checking for NA’s

There are no NAs in our dataset

#Check for NA's in each column dataset
colSums(is.na(employeeData))
##                       ID                      Age                Attrition 
##                        0                        0                        0 
##           BusinessTravel                DailyRate               Department 
##                        0                        0                        0 
##         DistanceFromHome                Education           EducationField 
##                        0                        0                        0 
##            EmployeeCount           EmployeeNumber  EnvironmentSatisfaction 
##                        0                        0                        0 
##                   Gender               HourlyRate           JobInvolvement 
##                        0                        0                        0 
##                 JobLevel                  JobRole          JobSatisfaction 
##                        0                        0                        0 
##            MaritalStatus            MonthlyIncome              MonthlyRate 
##                        0                        0                        0 
##       NumCompaniesWorked                   Over18                 OverTime 
##                        0                        0                        0 
##        PercentSalaryHike        PerformanceRating RelationshipSatisfaction 
##                        0                        0                        0 
##            StandardHours         StockOptionLevel        TotalWorkingYears 
##                        0                        0                        0 
##    TrainingTimesLastYear          WorkLifeBalance           YearsAtCompany 
##                        0                        0                        0 
##       YearsInCurrentRole  YearsSinceLastPromotion     YearsWithCurrManager 
##                        0                        0                        0

Summary of Attrition

  • There are more “No” than “Yes” in the Attrition column
  • No (730) and Yes (140)
  • Our first course of action to compare the Attrition to our Numerical Predictors that are related to money. We found that Job Level and Monthly Income had different level of means, which could contribute to Attrition.
#The count of Attrition of Yes and No
employeeData %>% count(Attrition)
##   Attrition   n
## 1        No 730
## 2       Yes 140
#Attrition Plot Count
employeeData %>% ggplot(aes(x=Attrition,fill=Attrition)) + 
  geom_bar()+
  ggtitle("Attrition Count") +
  xlab("Attrition")+ylab("Count")

### Pairs Plot for Attrition to Numerical Values
employeeData %>% select_if(is.numeric) %>% mutate(Attrition=employeeData$Attrition) %>% select(c(3,9,11,13,14,28)) %>% ggpairs(aes(colour = Attrition))

1st Influential Predictor: Monthly Income

Exploring Monthly Income

  • According to our exploratory data analysis, we found that monthly income has a strong indication of Attrition.
  • The histogram plot of Attrition count shows a right skew, but the data has an equal similar distribution for both yes and no.
  • I performed the Welch’s Two-Sample T-test to determine mean different, and the results were that was the mean different is not zero
  • In addition, the mean income of No is greater than the mean income of Yes. Additionally, I created a graph to compare Attrition to the Predictors that are influenced by money.
### Attrition Vs. MonthlyIncome
employeeData %>% ggplot(aes(x=MonthlyIncome,fill=Attrition))+
  geom_histogram()+
  ggtitle("Attrition Vs. MonthlyIncome")

### Mean Monthly Income of Attrition
employeeData %>% group_by(Attrition) %>% summarise(compareincomes=mean(MonthlyIncome))
## # A tibble: 2 × 2
##   Attrition compareincomes
##   <chr>              <dbl>
## 1 No                 6702 
## 2 Yes                4765.
### Welch's Two-Sample T-test to determine Difference in means for Monthly Income
t.test(employeeData$MonthlyIncome~employeeData$Attrition,data=employeeData)
## 
##  Welch Two Sample t-test
## 
## data:  employeeData$MonthlyIncome by employeeData$Attrition
## t = 5.3249, df = 228.45, p-value = 2.412e-07
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
##  1220.382 2654.047
## sample estimates:
##  mean in group No mean in group Yes 
##          6702.000          4764.786

2nd Influential Predictor: Job Level

Exploring Job Level

  • Job level has an affect on our model because if we look at our histogram, we can see some sort of right-skewness that equates to having more “Yes” when you’re Job Level is lower. This makes sense because if you are at the bottom of your Job Level, you are more likely to quit, as opposed to moving up on your job Level, which means higher man, you are probably less likely to quit your job.
  • We plotted a jitter plot MOnthly Income vs. JobLevel, and found that there some distinct features of more “Yes” at the lower end of the Monthly Income and Job Levels.
  • In addition, we performed a Welch’s Two-Sample T-test, and determine that there the mean difference is not zero.
### Attrition Vs. Job Level Histogram
employeeData %>% ggplot(aes(x=JobLevel,fill=Attrition))+
  geom_histogram()+
  ggtitle("Attrition Vs. JobLevel") 

### Monthly Income Vs. Job Level Jitter Plot
employeeData %>% ggplot(aes(x=JobLevel,y=MonthlyIncome,fill=Attrition, color=Attrition))+
  geom_jitter(stat="identity")+
  ggtitle("MonthlyIncome Vs. JobLevel") 

### Welch's Two Sample T-test for Job Level
t.test(employeeData$JobLevel~employeeData$Attrition,data=employeeData)
## 
##  Welch Two Sample t-test
## 
## data:  employeeData$JobLevel by employeeData$Attrition
## t = 5.231, df = 211.76, p-value = 4.042e-07
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
##  0.2995698 0.6618784
## sample estimates:
##  mean in group No mean in group Yes 
##          2.116438          1.635714

3rd Influential Predictor: OverTime

Exploring Overtime

  • Our third influential predictor is the categorical variable “Overtime.”
  • Over has the response “Yes” or “No”
  • Overtime is cleared skewed in that more people who have overtime will tend to quit.
  • Overtime compared to the other cateogrical variables has a different mean among the Yes and No. 
### Attrition Vs. OverTime
employeeData %>% 
  ggplot(aes(x=OverTime,fill=Attrition))+
  geom_bar(position="fill")+ggtitle("Attrition Vs. Overtime")+
  scale_y_continuous(labels = scales::percent)

EDA on Other Categorical Variables

When we graphs the rest of the categorical variables, we saw some interesting trends. Sales Representatives tend to quit more than the other job roles. Job Satisfaction is pretty even in the “Yes”. Single people tend to quit more than Married or Divorced. Relationship Satisfaction is even as well. Business Travel has a larger population in “Yes” for Travel Rarely, but I believe that Overtime plays more of a major role in Attrition.

### Percentage Compares for Job Role
ggplot(employeeData, aes(x = JobRole, fill = Attrition)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent)+ 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

### Attrition Vs. Job Satisfaction
employeeData %>% 
  ggplot(aes(x=JobSatisfaction,fill=Attrition))+
  geom_bar()+
  ggtitle("Attrition Vs. Job Satisfaction") 

### Attrition vs Marital Status
employeeData %>% 
  ggplot(aes(x=MaritalStatus,fill=Attrition))+
  geom_bar(position="fill")+
  ggtitle("Attrition Vs. Marital Status")

### Attrition Vs. RelationshipSatisfaction
employeeData %>% 
  ggplot(aes(x=RelationshipSatisfaction,fill=Attrition))+
  geom_histogram()+ggtitle("Attrition Vs. RelationshipSatisfaction")+
  scale_y_continuous(labels = scales::percent)

### Attrition Vs. BusinessTravel
employeeData %>% 
  ggplot(aes(x=BusinessTravel,fill=Attrition))+
  geom_bar()+ggtitle("Attrition Vs. BusinessTravel")+
  scale_y_continuous(labels = scales::percent)

Data Prep - Cleanip and Wrangling

  • Dummy Variable the Categorical Variables
  • Overtime is Changed to 0 or 1
  • Scaled Age, Monthly Income, Hourly Rate, Monthly Rate, Percent Salary hike, and Daily Rate
# Created Dataset for Naive Bayes
employeeData3 = read.csv("CaseStudy2-data.csv")

# Make overtime and attrition column binary 
employeeData3$OverTime = ifelse(employeeData$OverTime=="Yes",1,0)

# Scaled Age, Monthly Income, Hourly Rate, Monthly Rate, Percent Salary hike, and Daily Rate
employeeData3$NAge=scale(employeeData3$Age)
employeeData3$NMonthylyIncome=scale(employeeData3$MonthlyIncome)
employeeData3$NHourlyRate=scale(employeeData3$HourlyRate)
employeeData3$NMonthlyRate=scale(employeeData3$MonthlyRate)
employeeData3$NPercentSalaryHike=scale(employeeData3$PercentSalaryHike)
employeeData3$NDailyRate=scale(employeeData3$DailyRate)

# Created Dummy Variables for Business Travel
employeeData3$BTNone = ifelse(employeeData$BusinessTravel=="Non-Travel",1,0)
employeeData3$BTRare=ifelse(employeeData$BusinessTravel=="Travel_Rarely",1,0)
employeeData3$BTFreq=ifelse(employeeData$BusinessTravel=="Travel_Frequently",1,0)

# Created Dummy Variables for Departments
employeeData3$DepHR=ifelse(employeeData$Department=="Human Resources",1,0)
employeeData3$DepSales=ifelse(employeeData$Department=="Sales",1,0)
employeeData3$DepRD=ifelse(employeeData$Department=="Research & Development",1,0)

# Created Dummy Variables for Education Field
employeeData3$EFHR=ifelse(employeeData$EducationField=="Human Resources",1,0)
employeeData3$EFLS=ifelse(employeeData$EducationField=="Life Sciences",1,0)
employeeData3$EFM=ifelse(employeeData$EducationField=="Marketing",1,0)
employeeData3$EFMed=ifelse(employeeData$EducationField=="Medical",1,0)
employeeData3$EFT=ifelse(employeeData$EducationField=="Technical Degree",1,0)
employeeData3$EFOther=ifelse(employeeData$EducationField=="Other",1,0)

# Created Dummy Variables for Gender
employeeData3$Male=ifelse(employeeData$Gender=="Male",1,0)
employeeData3$Female=ifelse(employeeData$Gender=="Female",1,0)

# Created Dummy Variables for Job Roles
employeeData3$JRHR=ifelse(employeeData$JobRole=="Healthcare Representative",1,0)
employeeData3$JRLT=ifelse(employeeData$JobRole=="Laboratory Technician",1,0)
employeeData3$JRManager=ifelse(employeeData$JobRole=="Manager",1,0)
employeeData3$JRMD=ifelse(employeeData$JobRole=="Manufacturing Director",1,0)
employeeData3$JRRD=ifelse(employeeData$JobRole=="Research Director",1,0)
employeeData3$JRRS=ifelse(employeeData$JobRole=="Research Scientist",1,0)
employeeData3$JRSE=ifelse(employeeData$JobRole=="Sales Executive",1,0)
employeeData3$JRSR=ifelse(employeeData$JobRole=="Sales Representative",1,0)

# Created Dummy Variables for Marital Status
employeeData3$Divorced=ifelse(employeeData$MaritalStatus=="Divorced",1,0)
employeeData3$Single=ifelse(employeeData$MaritalStatus=="Single",1,0)
employeeData3$Married=ifelse(employeeData$MaritalStatus=="Married",1,0)

# Created Dummy Variables for Supervisor roles vs Non-Supervisor Roles
employeeData3$JR1 = ifelse(employeeData$JobRole=="Manager"|employeeData$JobRole=="Research Director",1,0)

Attrition Model: Naive Bayes

Initial Analysis: All Predictors

In our data modeling comparisons, we found that Naive Bayes was the best model. We first use Naive Bayes using all the predictors. We found that the Naive Bayes gave us the best Sensitivity 0.92, Specificity 0.34, and Accuracy of 0.7088. This fails to meet our condition of meeting at least 60% on both sensitivity and specificity. In our train test set, we used a 70-30 split in our dataset to model using Naive Bayes. Now we are going to do a forward selection by hand on which predictors are the best one by one.

set.seed(13)

# Naive Bayes Model (Selecting All Variables including scaled Continuous and Categorical Variables) - Ignoring Already Address Variables that do not fit the model
naive_data=employeeData3

model2 = naive_data %>% select(c("NAge","NDailyRate","DistanceFromHome", "EnvironmentSatisfaction", "NHourlyRate", "JobInvolvement", "JobLevel", "JobSatisfaction", "NMonthylyIncome", "NMonthlyRate", "NumCompaniesWorked", "OverTime", "NPercentSalaryHike", "PerformanceRating", "RelationshipSatisfaction", "YearsAtCompany", "YearsInCurrentRole","YearsSinceLastPromotion", "YearsWithCurrManager", "BTRare","BTFreq","BTNone","DepHR","DepSales","EFHR","EFLS","EFM","EFMed","EFT","EFOther","Male","Female","JRHR","JRLT" ,"JRManager","JRMD","JRRD","JRRS","JRSE","JRSR", "Divorced","Single","Married","StockOptionLevel","TotalWorkingYears","TrainingTimesLastYear","WorkLifeBalance","Attrition"))
model2$Attrition = as.factor(model2$Attrition)

trainIndices = sample(1:dim(model2)[1],round(.70 * dim(model2)[1]))
train = model2[trainIndices,]
test = model2[-trainIndices,]

classifier1 = naiveBayes(model2[,c(1:47)],model2$Attrition)

pred = predict(classifier1,newdata=test)
confusionMatrix(table(test$Attrition,pred))
## Confusion Matrix and Statistics
## 
##      pred
##        No Yes
##   No  151  64
##   Yes  12  34
##                                           
##                Accuracy : 0.7088          
##                  95% CI : (0.6496, 0.7632)
##     No Information Rate : 0.6245          
##     P-Value [Acc > NIR] : 0.002628        
##                                           
##                   Kappa : 0.3057          
##                                           
##  Mcnemar's Test P-Value : 4.913e-09       
##                                           
##             Sensitivity : 0.9264          
##             Specificity : 0.3469          
##          Pos Pred Value : 0.7023          
##          Neg Pred Value : 0.7391          
##              Prevalence : 0.6245          
##          Detection Rate : 0.5785          
##    Detection Prevalence : 0.8238          
##       Balanced Accuracy : 0.6367          
##                                           
##        'Positive' Class : No              
## 

Final Attrition Model Analysis

Best Model: Naive Bayes

In our data modeling comparisons, we found that the Naive Bayes gave us the best Sensitivity 0.8941, Specificity 0.8400, and Accuracy of 0.8889. In our train test set, we used a 70-30 split in our dataset to model using Naive Bayes. We found these predictors to be our best model by using Forward Selection. I picked one variable at a time to add into our model, if it increased our Sensitivity, Specificity, and Accuracy, I kept it, and went onto the next variable. Best Predictor Variables: JobLevel, NMonthylyIncome, NumCompaniesWorked, NMonthlyRate, OverTime, PerformanceRating, YearsWithCurrManager, RelationshipSatisfaction, YearsAtCompany, YearsSinceLastPromotion, BTRare

set.seed(13)

# Naive Bayes Model (Selecting All Variables including scaled Continuous and Categorical Variables) - Ignoring Already Address Variables that do not fit the model
naive_data=employeeData3

model2 = naive_data %>% select(c("NAge","NDailyRate","DistanceFromHome", "EnvironmentSatisfaction", "NHourlyRate", "JobInvolvement", "JobLevel", "JobSatisfaction", "NMonthylyIncome", "NMonthlyRate", "NumCompaniesWorked", "OverTime", "NPercentSalaryHike", "PerformanceRating", "RelationshipSatisfaction", "YearsAtCompany", "YearsInCurrentRole","YearsSinceLastPromotion", "YearsWithCurrManager", "BTRare","BTFreq","BTNone","DepHR","DepSales","EFHR","EFLS","EFM","EFMed","EFT","EFOther","Male","Female","JRHR","JRLT" ,"JRManager","JRMD","JRRD","JRRS","JRSE","JRSR", "Divorced","Single","Married","StockOptionLevel","TotalWorkingYears","TrainingTimesLastYear","WorkLifeBalance","Attrition"))
model2$Attrition = as.factor(model2$Attrition)

trainIndices = sample(1:dim(model2)[1],round(.70 * dim(model2)[1]))
train = model2[trainIndices,]
test = model2[-trainIndices,]

classifier1 = naiveBayes(model2[,c(7,9,12,6,11,10,13,15,20,21,23,24,26,28)],model2$Attrition)

pred = predict(classifier1,newdata=test)
confusionMatrix(table(test$Attrition,pred))
## Confusion Matrix and Statistics
## 
##      pred
##        No Yes
##   No  211   4
##   Yes  25  21
##                                           
##                Accuracy : 0.8889          
##                  95% CI : (0.8443, 0.9243)
##     No Information Rate : 0.9042          
##     P-Value [Acc > NIR] : 0.8290315       
##                                           
##                   Kappa : 0.5337          
##                                           
##  Mcnemar's Test P-Value : 0.0002041       
##                                           
##             Sensitivity : 0.8941          
##             Specificity : 0.8400          
##          Pos Pred Value : 0.9814          
##          Neg Pred Value : 0.4565          
##              Prevalence : 0.9042          
##          Detection Rate : 0.8084          
##    Detection Prevalence : 0.8238          
##       Balanced Accuracy : 0.8670          
##                                           
##        'Positive' Class : No              
## 

2nd Attrition Model: KNN

Initial Analysis: All Predictor Variables

We performed a KNN model to train our dataset, but failed to meet the requirement of 60% for both sensitivity and specificity. Our KNN model gave us Thus, we will move onto Naive Bayes model.

model = employeeData3 %>% select(c("NAge","NDailyRate","DistanceFromHome", "EnvironmentSatisfaction", "NHourlyRate", "JobInvolvement", "JobLevel", "JobSatisfaction", "NMonthylyIncome", "NMonthlyRate", "NumCompaniesWorked", "OverTime", "NPercentSalaryHike", "PerformanceRating", "RelationshipSatisfaction", "YearsAtCompany", "YearsInCurrentRole","YearsSinceLastPromotion", "YearsWithCurrManager", "BTRare","BTFreq","BTNone","DepHR","DepSales","DepRD","EFHR","EFLS","EFM","EFMed","EFT","EFOther","Male","Female","JRHR","JRLT" ,"JRManager","JRMD","JRRD","JRRS","JRSE","JRSR", "Divorced","Single","Married","StockOptionLevel","TotalWorkingYears","TrainingTimesLastYear","WorkLifeBalance","Attrition"))

set.seed(13)
iterations = 200
numks = 20
splitPerc = .70
masterAcc = matrix(nrow = iterations, ncol = numks)
for(j in 1:iterations)
{
  trainIndices = sample(1:dim(model)[1],round(splitPerc * dim(model)[1]))
  train = model[trainIndices,]
  test = model[-trainIndices,]
  for(i in 1:numks)
  {
    classifications = knn(train[,c(1:47)],test[,c(1:47)],train$Attrition, prob = TRUE, k = i)
    table(classifications,test$Attrition)
    CM = confusionMatrix(table(classifications,test$Attrition))
    masterAcc[j,i] = CM$overall[1]
  }
}
MeanAcc = colMeans(masterAcc)
plot(seq(1,numks,1),MeanAcc, type = "l")

which.max(MeanAcc)
## [1] 11
classifications = knn(train[,c(1:47)],test[,c(1:47)],train$Attrition, prob = TRUE, k = which.max(MeanAcc))
table(classifications,test$Attrition)
##                
## classifications  No Yes
##             No  220  32
##             Yes   3   6
confusionMatrix(table(classifications,test$Attrition))
## Confusion Matrix and Statistics
## 
##                
## classifications  No Yes
##             No  220  32
##             Yes   3   6
##                                           
##                Accuracy : 0.8659          
##                  95% CI : (0.8185, 0.9048)
##     No Information Rate : 0.8544          
##     P-Value [Acc > NIR] : 0.3366          
##                                           
##                   Kappa : 0.2113          
##                                           
##  Mcnemar's Test P-Value : 2.214e-06       
##                                           
##             Sensitivity : 0.9865          
##             Specificity : 0.1579          
##          Pos Pred Value : 0.8730          
##          Neg Pred Value : 0.6667          
##              Prevalence : 0.8544          
##          Detection Rate : 0.8429          
##    Detection Prevalence : 0.9655          
##       Balanced Accuracy : 0.5722          
##                                           
##        'Positive' Class : No              
## 

KNN Model

Final Analysis: The best Predictor Variablesare: JobLevel, NMonthylyIncome, NumCompaniesWorked, NMonthlyRate, OverTime, PerformanceRating, YearsWithCurrManager, RelationshipSatisfaction, YearsAtCompany, YearsSinceLastPromotion, BTRare

model = employeeData3 %>% select(c("NAge","NDailyRate","DistanceFromHome", "EnvironmentSatisfaction", "NHourlyRate", "JobInvolvement", "JobLevel", "JobSatisfaction", "NMonthylyIncome", "NMonthlyRate", "NumCompaniesWorked", "OverTime", "NPercentSalaryHike", "PerformanceRating", "RelationshipSatisfaction", "YearsAtCompany", "YearsInCurrentRole","YearsSinceLastPromotion", "YearsWithCurrManager", "BTRare","BTFreq","BTNone","DepHR","DepSales","DepRD","EFHR","EFLS","EFM","EFMed","EFT","EFOther","Male","Female","JRHR","JRLT" ,"JRManager","JRMD","JRRD","JRRS","JRSE","JRSR", "Divorced","Single","Married","StockOptionLevel","TotalWorkingYears","TrainingTimesLastYear","WorkLifeBalance","Attrition"))
head(model)
##         NAge  NDailyRate DistanceFromHome EnvironmentSatisfaction NHourlyRate
## 1 -0.5409772 -1.74071116               13                       2   0.3669771
## 2  0.3552859  1.22850265               14                       3  -1.0738619
## 3 -0.2048785 -1.53378862               18                       3  -0.2789163
## 4 -0.5409772 -0.03546998                1                       3  -0.8751255
## 5 -1.4372403 -0.61884196                2                       1  -1.6700711
## 6 -1.1011416 -1.29944261               10                       4  -1.6700711
##   JobInvolvement JobLevel JobSatisfaction NMonthylyIncome NMonthlyRate
## 1              3        2               4      -0.4322305   -0.7140332
## 2              2        5               3       2.8787757    0.4527584
## 3              3        3               4       0.6463532    0.7903879
## 4              3        3               4       0.8769035    1.3654837
## 5              3        1               4      -0.5720831    0.4068970
## 6              3        3               1       0.5225956   -1.3387886
##   NumCompaniesWorked OverTime NPercentSalaryHike PerformanceRating
## 1                  2        0         -1.1427203                 3
## 2                  1        0         -0.3264915                 3
## 3                  2        0         -1.1427203                 3
## 4                  1        0          1.0338898                 3
## 5                  1        1         -0.5985678                 3
## 6                  1        0          1.5780423                 4
##   RelationshipSatisfaction YearsAtCompany YearsInCurrentRole
## 1                        3              5                  2
## 2                        1             20                  7
## 3                        3              2                  2
## 4                        3             14                 10
## 5                        3              6                  3
## 6                        3              9                  7
##   YearsSinceLastPromotion YearsWithCurrManager BTRare BTFreq BTNone DepHR
## 1                       0                    3      1      0      0     0
## 2                       4                    9      1      0      0     0
## 3                       2                    2      0      1      0     0
## 4                       5                    7      1      0      0     0
## 5                       1                    3      0      1      0     0
## 6                       1                    7      0      1      0     0
##   DepSales DepRD EFHR EFLS EFM EFMed EFT EFOther Male Female JRHR JRLT
## 1        1     0    0    1   0     0   0       0    1      0    0    0
## 2        0     1    0    0   0     1   0       0    1      0    0    0
## 3        0     1    0    1   0     0   0       0    1      0    0    0
## 4        1     0    0    0   1     0   0       0    0      1    0    0
## 5        0     1    0    0   0     0   1       0    0      1    0    0
## 6        0     1    0    1   0     0   0       0    1      0    0    0
##   JRManager JRMD JRRD JRRS JRSE JRSR Divorced Single Married StockOptionLevel
## 1         0    0    0    0    1    0        1      0       0                1
## 2         0    0    1    0    0    0        0      1       0                0
## 3         0    1    0    0    0    0        0      1       0                0
## 4         0    0    0    0    1    0        0      0       1                2
## 5         0    0    0    1    0    0        0      1       0                0
## 6         0    1    0    0    0    0        1      0       0                2
##   TotalWorkingYears TrainingTimesLastYear WorkLifeBalance Attrition
## 1                 8                     3               2        No
## 2                21                     2               4        No
## 3                10                     2               3        No
## 4                14                     3               3        No
## 5                 6                     2               3        No
## 6                 9                     4               2        No
iterations = 1
numks = 20
splitPerc = .70
masterAcc = matrix(nrow = iterations, ncol = numks)
for(j in 1:iterations)
{
  trainIndices = sample(1:dim(model)[1],round(splitPerc * dim(model)[1]))
  train = model[trainIndices,]
  test = model[-trainIndices,]
  for(i in 1:numks)
  {
    classifications = knn(train[,c(7,9,12,6,11,10,13,15,20,21,23,24,26,28)],test[,c(7,9,12,6,11,10,13,15,20,21,23,24,26,28)],train$Attrition, prob = TRUE, k = i)
    table(classifications,test$Attrition)
    CM = confusionMatrix(table(classifications,test$Attrition))
    masterAcc[j,i] = CM$overall[1]
  }
  
}
MeanAcc = colMeans(masterAcc)
plot(seq(1,numks,1),MeanAcc, type = "l")

which.max(MeanAcc)
## [1] 4
max(MeanAcc)
## [1] 0.8237548
classifications = knn(train[,c(7,9,12,6,11,10,13,15,20,21,23,24,26,28)],test[,c(7,9,12,6,11,10,13,15,20,21,23,24,26,28)],train$Attrition, prob = TRUE, k = which.max(MeanAcc))
table(classifications,test$Attrition)
##                
## classifications  No Yes
##             No  208  38
##             Yes   5  10
confusionMatrix(table(classifications,test$Attrition))
## Confusion Matrix and Statistics
## 
##                
## classifications  No Yes
##             No  208  38
##             Yes   5  10
##                                           
##                Accuracy : 0.8352          
##                  95% CI : (0.7846, 0.8781)
##     No Information Rate : 0.8161          
##     P-Value [Acc > NIR] : 0.2386          
##                                           
##                   Kappa : 0.2519          
##                                           
##  Mcnemar's Test P-Value : 1.061e-06       
##                                           
##             Sensitivity : 0.9765          
##             Specificity : 0.2083          
##          Pos Pred Value : 0.8455          
##          Neg Pred Value : 0.6667          
##              Prevalence : 0.8161          
##          Detection Rate : 0.7969          
##    Detection Prevalence : 0.9425          
##       Balanced Accuracy : 0.5924          
##                                           
##        'Positive' Class : No              
## 

Attrition Model Conclusion

Naive Bayes with the Predctors of JobLevel, MonthlyIncome, NumCompaniesWorked, MonthlyRate, OverTime, PerformanceRating, YearsWithCurrManager, RelationshipSatisfaction, YearsAtCompany, YearsSinceLastPromotion, BTRare is the best predictive model of Attrition.

Top Three Factors of Attrition:

  • Monthly Income
  • Job Role
  • Overtime

Predicting Attrition using Test Set

Naive Bayes Model Predictors: JobLevel, MonthlyIncome, NumCompaniesWorked, MonthlyRate, OverTime, PerformanceRating, YearsWithCurrManager, RelationshipSatisfaction, YearsAtCompany, YearsSinceLastPromotion, BTRare

### Adjusting the No Attrition Dataset to Fit our Model
employeetestdata=read.csv("CaseStudy2CompSetNoAttrition.csv")

employeetestdata = employeetestdata

employeetestdata$NAge=scale(employeetestdata$Age)
employeetestdata$NMonthylyIncome=scale(employeetestdata$MonthlyIncome)
employeetestdata$NHourlyRate=scale(employeetestdata$HourlyRate)
employeetestdata$NMonthlyRate=scale(employeetestdata$MonthlyRate)
employeetestdata$NPercentSalaryHike=scale(employeetestdata$PercentSalaryHike)
employeetestdata$NDailyRate=scale(employeetestdata$DailyRate)

employeetestdata$OverTime = ifelse(employeetestdata$OverTime=="Yes",1,0)


employeetestdata$BTNone = ifelse(employeetestdata$BusinessTravel=="Non-Travel",1,0)
employeetestdata$BTRare=ifelse(employeetestdata$BusinessTravel=="Travel_Rarely",1,0)
employeetestdata$BTFreq=ifelse(employeetestdata$BusinessTravel=="Travel_Frequently",1,0)

employeetestdata$DepHR=ifelse(employeetestdata$Department=="Human Resources",1,0)
employeetestdata$DepSales=ifelse(employeetestdata$Department=="Sales",1,0)

employeetestdata$EFHR=ifelse(employeetestdata$EducationField=="Human Resources",1,0)
employeetestdata$EFLS=ifelse(employeetestdata$EducationField=="Life Sciences",1,0)
employeetestdata$EFM=ifelse(employeetestdata$EducationField=="Marketing",1,0)
employeetestdata$EFMed=ifelse(employeetestdata$EducationField=="Medical",1,0)
employeetestdata$EFT=ifelse(employeetestdata$EducationField=="Technical Degree",1,0)
employeetestdata$EFOther=ifelse(employeetestdata$EducationField=="Other",1,0)

employeetestdata$Male=ifelse(employeetestdata$Gender=="Male",1,0)
employeetestdata$Female=ifelse(employeetestdata$Gender=="Female",1,0)

employeetestdata$JRHR=ifelse(employeetestdata$JobRole=="Healthcare Representative",1,0)
employeetestdata$JRLT=ifelse(employeetestdata$JobRole=="Laboratory Technician",1,0)
employeetestdata$JRManager=ifelse(employeetestdata$JobRole=="Manager",1,0)
employeetestdata$JRMD=ifelse(employeetestdata$JobRole=="Manufacturing Director",1,0)
employeetestdata$JRRD=ifelse(employeetestdata$JobRole=="Research Director",1,0)
employeetestdata$JRRS=ifelse(employeetestdata$JobRole=="Research Scientist",1,0)
employeetestdata$JRSE=ifelse(employeetestdata$JobRole=="Sales Executive",1,0)
employeetestdata$JRSR=ifelse(employeetestdata$JobRole=="Sales Representative",1,0)

employeetestdata$Divorced=ifelse(employeetestdata$MaritalStatus=="Divorced",1,0)
employeetestdata$Single=ifelse(employeetestdata$MaritalStatus=="Single",1,0)
employeetestdata$Married=ifelse(employeetestdata$MaritalStatus=="Married",1,0)

Naive Prediction Model with the No Attrtion Test Set

prednoattrition = predict(classifier1,newdata=employeetestdata)

employeetestdata$Attrition=unlist(prednoattrition)

Case2PredictionsAttritionNguyen = data.frame(c(employeetestdata$ID),c(employeetestdata$Attrition))

#Prediction Attrition Set - See CaseStudyPredictionAttrition in Github for full R code
head(Case2PredictionsAttritionNguyen)
##   c.employeetestdata.ID. c.employeetestdata.Attrition.
## 1                   1171                            No
## 2                   1172                            No
## 3                   1173                            No
## 4                   1174                            No
## 5                   1175                            No
## 6                   1176                            No
### Prediction Attrition of Test Set
employeetestdata %>% ggplot(aes(x=Attrition,fill=Attrition)) + 
  geom_bar()+
  ggtitle("Predicted Attrition Count") +
  xlab("Attrition")+ylab("Count")

###Count of "Yes" and "No"
employeetestdata %>% count(Attrition)
##   Attrition   n
## 1        No 274
## 2       Yes  26

Predicting Monthly Income - Multiple Linear Regression

For the second part of our assignment, we will be running a prediction on Salary. We are going to run a multiple linear regression to find the best predictor that are statistically significant. From there we will choose those and run an interaction to find more statistically significant values.

Multiple Linear Regression Model of All Other Variables (Excluding Attrition)

### Multiple Linear Regression using All Predictors
lmsalary_model1=lm(MonthlyIncome~.,data=lm_salarydf)

### Summary of the Linear Model
summary(lmsalary_model1)
## 
## Call:
## lm(formula = MonthlyIncome ~ ., data = lm_salarydf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3650.2  -667.7     0.8   629.1  4147.2 
## 
## Coefficients: (5 not defined because of singularities)
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               5.980e+01  7.287e+02   0.082  0.93461    
## Age                      -2.207e+00  5.593e+00  -0.395  0.69320    
## DailyRate                 1.468e-01  9.135e-02   1.607  0.10837    
## DistanceFromHome         -6.960e+00  4.568e+00  -1.524  0.12798    
## EnvironmentSatisfaction  -3.612e+00  3.367e+01  -0.107  0.91459    
## HourlyRate               -3.674e-01  1.827e+00  -0.201  0.84069    
## JobInvolvement            1.737e+01  5.327e+01   0.326  0.74449    
## JobLevel                  2.787e+03  8.351e+01  33.371  < 2e-16 ***
## JobSatisfaction           2.662e+01  3.338e+01   0.798  0.42532    
## MonthlyRate              -9.082e-03  5.144e-03  -1.765  0.07785 .  
## NumCompaniesWorked        3.117e+00  1.681e+01   0.185  0.85296    
## OverTime                 -1.294e+01  8.441e+01  -0.153  0.87822    
## PercentSalaryHike         2.467e+01  1.582e+01   1.559  0.11928    
## PerformanceRating        -3.185e+02  1.615e+02  -1.972  0.04890 *  
## RelationshipSatisfaction  1.705e+01  3.329e+01   0.512  0.60875    
## YearsAtCompany           -4.482e+00  1.363e+01  -0.329  0.74240    
## YearsInCurrentRole        5.708e+00  1.703e+01   0.335  0.73759    
## YearsSinceLastPromotion   2.983e+01  1.532e+01   1.947  0.05184 .  
## YearsWithCurrManager     -2.674e+01  1.667e+01  -1.604  0.10900    
## BTRare                    3.747e+02  1.201e+02   3.119  0.00188 ** 
## BTFreq                    1.939e+02  1.422e+02   1.364  0.17303    
## BTNone                           NA         NA      NA       NA    
## DepHR                    -1.290e+02  4.773e+02  -0.270  0.78708    
## DepSales                 -5.645e+02  3.312e+02  -1.705  0.08865 .  
## DepRD                            NA         NA      NA       NA    
## EFHR                     -8.487e+01  3.950e+02  -0.215  0.82995    
## EFLS                      5.762e+01  1.600e+02   0.360  0.71879    
## EFM                       2.342e+01  1.977e+02   0.118  0.90576    
## EFMed                    -4.998e+01  1.634e+02  -0.306  0.75973    
## EFT                       1.235e+01  1.955e+02   0.063  0.94964    
## EFOther                          NA         NA      NA       NA    
## Male                      1.111e+02  7.453e+01   1.491  0.13625    
## Female                           NA         NA      NA       NA    
## JRHR                      1.825e+02  5.150e+02   0.354  0.72315    
## JRLT                     -4.160e+02  4.960e+02  -0.839  0.40189    
## JRManager                 4.459e+03  4.943e+02   9.022  < 2e-16 ***
## JRMD                      3.585e+02  5.117e+02   0.701  0.48377    
## JRRD                      4.223e+03  5.523e+02   7.646 5.76e-14 ***
## JRRS                     -1.686e+02  4.951e+02  -0.340  0.73360    
## JRSE                      6.920e+02  5.107e+02   1.355  0.17582    
## JRSR                      2.699e+02  5.168e+02   0.522  0.60171    
## Divorced                 -6.652e+01  1.001e+02  -0.665  0.50640    
## Single                   -5.373e+01  1.028e+02  -0.523  0.60138    
## Married                          NA         NA      NA       NA    
## StockOptionLevel          3.185e+00  5.693e+01   0.056  0.95540    
## TotalWorkingYears         5.176e+01  1.098e+01   4.716 2.82e-06 ***
## TrainingTimesLastYear     2.473e+01  2.914e+01   0.849  0.39630    
## WorkLifeBalance          -3.669e+01  5.168e+01  -0.710  0.47801    
## AttritionYes              8.447e+01  1.155e+02   0.731  0.46485    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1057 on 826 degrees of freedom
## Multiple R-squared:  0.9498, Adjusted R-squared:  0.9471 
## F-statistic: 363.2 on 43 and 826 DF,  p-value: < 2.2e-16

Statistical Significant P-values < .05:

JobLeve, MonthlyRate, PerformanceRating, YearsSinceLastPromotion, BTRare, DepSales, JRManager, JRRD, and TotalWorkingYears were all values where the p-value was close to < .05.

We are now going to run a model with those variables.
Results: RSME = 1062, Adjusted R^2 = 0.9466

lmsalary_model2=lm(MonthlyIncome~JobLevel+MonthlyRate+PerformanceRating+BTRare+YearsSinceLastPromotion+DepSales+JRMD+JRManager+JRRD+TotalWorkingYears,data=lm_salarydf)

summary(lmsalary_model2)
## 
## Call:
## lm(formula = MonthlyIncome ~ JobLevel + MonthlyRate + PerformanceRating + 
##     BTRare + YearsSinceLastPromotion + DepSales + JRMD + JRManager + 
##     JRRD + TotalWorkingYears, data = lm_salarydf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3895.0  -601.3   -50.3   640.0  4216.8 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -3.338e+02  3.483e+02  -0.958 0.338207    
## JobLevel                 2.940e+03  6.764e+01  43.466  < 2e-16 ***
## MonthlyRate             -9.172e-03  5.096e-03  -1.800 0.072232 .  
## PerformanceRating       -1.152e+02  1.010e+02  -1.141 0.254093    
## BTRare                   2.704e+02  7.981e+01   3.388 0.000735 ***
## YearsSinceLastPromotion  1.695e+01  1.283e+01   1.321 0.186922    
## DepSales                 9.721e+01  8.806e+01   1.104 0.269932    
## JRMD                     3.594e+02  1.360e+02   2.644 0.008349 ** 
## JRManager                3.943e+03  2.066e+02  19.089  < 2e-16 ***
## JRRD                     4.054e+03  2.027e+02  20.005  < 2e-16 ***
## TotalWorkingYears        4.161e+01  8.312e+00   5.006 6.75e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1062 on 859 degrees of freedom
## Multiple R-squared:  0.9472, Adjusted R-squared:  0.9466 
## F-statistic:  1541 on 10 and 859 DF,  p-value: < 2.2e-16

Now we are going to run another model with those variables and their interactions
Results: RSME of 1047, Adjusted R^2 = 0.9481

lmsalary_model3=lm(MonthlyIncome~(JobLevel+MonthlyRate+PerformanceRating+BTRare+YearsSinceLastPromotion+DepSales+JRMD+JRManager+JRRD+TotalWorkingYears)^2,data=lm_salarydf)

summary(lmsalary_model3)
## 
## Call:
## lm(formula = MonthlyIncome ~ (JobLevel + MonthlyRate + PerformanceRating + 
##     BTRare + YearsSinceLastPromotion + DepSales + JRMD + JRManager + 
##     JRRD + TotalWorkingYears)^2, data = lm_salarydf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3338.8  -631.5   -70.9   607.0  4323.1 
## 
## Coefficients: (5 not defined because of singularities)
##                                             Estimate Std. Error t value
## (Intercept)                                8.990e+02  1.289e+03   0.697
## JobLevel                                   1.621e+03  7.682e+02   2.110
## MonthlyRate                                1.774e-02  4.920e-02   0.360
## PerformanceRating                         -3.266e+02  3.913e+02  -0.835
## BTRare                                    -4.122e+01  7.672e+02  -0.054
## YearsSinceLastPromotion                    5.808e+00  1.367e+02   0.042
## DepSales                                   3.387e+02  9.243e+02   0.366
## JRMD                                      -8.929e+02  1.378e+03  -0.648
## JRManager                                  6.713e+03  2.756e+03   2.436
## JRRD                                       5.981e+03  2.545e+03   2.351
## TotalWorkingYears                          6.540e+01  8.943e+01   0.731
## JobLevel:MonthlyRate                       1.292e-02  1.013e-02   1.276
## JobLevel:PerformanceRating                 2.166e+02  2.323e+02   0.932
## JobLevel:BTRare                            1.407e+01  1.578e+02   0.089
## JobLevel:YearsSinceLastPromotion           4.050e+01  2.038e+01   1.987
## JobLevel:DepSales                          3.222e+02  1.569e+02   2.053
## JobLevel:JRMD                              3.690e+02  2.651e+02   1.392
## JobLevel:JRManager                        -1.682e+02  3.141e+02  -0.535
## JobLevel:JRRD                              5.409e+01  2.759e+02   0.196
## JobLevel:TotalWorkingYears                 1.417e+01  7.890e+00   1.796
## MonthlyRate:PerformanceRating             -5.022e-03  1.438e-02  -0.349
## MonthlyRate:BTRare                         4.395e-03  1.139e-02   0.386
## MonthlyRate:YearsSinceLastPromotion        1.068e-03  1.884e-03   0.567
## MonthlyRate:DepSales                       4.719e-03  1.272e-02   0.371
## MonthlyRate:JRMD                          -4.697e-03  1.931e-02  -0.243
## MonthlyRate:JRManager                      2.319e-02  3.227e-02   0.719
## MonthlyRate:JRRD                          -3.992e-03  3.226e-02  -0.124
## MonthlyRate:TotalWorkingYears             -3.853e-03  1.236e-03  -3.118
## PerformanceRating:BTRare                   5.521e+01  2.301e+02   0.240
## PerformanceRating:YearsSinceLastPromotion -1.604e+01  3.940e+01  -0.407
## PerformanceRating:DepSales                -3.064e+02  2.735e+02  -1.120
## PerformanceRating:JRMD                     1.053e+02  3.861e+02   0.273
## PerformanceRating:JRManager               -5.558e+02  6.594e+02  -0.843
## PerformanceRating:JRRD                    -3.377e+02  7.157e+02  -0.472
## PerformanceRating:TotalWorkingYears        2.785e-01  2.686e+01   0.010
## BTRare:YearsSinceLastPromotion             1.053e+01  2.944e+01   0.358
## BTRare:DepSales                           -3.285e+01  1.980e+02  -0.166
## BTRare:JRMD                               -1.524e+02  3.057e+02  -0.498
## BTRare:JRManager                          -1.043e+02  5.415e+02  -0.193
## BTRare:JRRD                               -1.977e+02  4.685e+02  -0.422
## BTRare:TotalWorkingYears                   4.321e+00  1.898e+01   0.228
## YearsSinceLastPromotion:DepSales          -5.521e+00  3.238e+01  -0.171
## YearsSinceLastPromotion:JRMD              -5.975e+01  4.762e+01  -1.255
## YearsSinceLastPromotion:JRManager         -5.040e+00  5.422e+01  -0.093
## YearsSinceLastPromotion:JRRD              -6.130e+01  6.383e+01  -0.960
## YearsSinceLastPromotion:TotalWorkingYears -3.452e+00  2.072e+00  -1.666
## DepSales:JRMD                                     NA         NA      NA
## DepSales:JRManager                        -1.678e+03  4.535e+02  -3.701
## DepSales:JRRD                                     NA         NA      NA
## DepSales:TotalWorkingYears                 2.159e+01  2.014e+01   1.072
## JRMD:JRManager                                    NA         NA      NA
## JRMD:JRRD                                         NA         NA      NA
## JRMD:TotalWorkingYears                     4.421e+01  2.830e+01   1.562
## JRManager:JRRD                                    NA         NA      NA
## JRManager:TotalWorkingYears               -6.519e+00  3.642e+01  -0.179
## JRRD:TotalWorkingYears                    -1.721e+01  3.429e+01  -0.502
##                                           Pr(>|t|)    
## (Intercept)                               0.485908    
## JobLevel                                  0.035184 *  
## MonthlyRate                               0.718598    
## PerformanceRating                         0.404064    
## BTRare                                    0.957168    
## YearsSinceLastPromotion                   0.966118    
## DepSales                                  0.714138    
## JRMD                                      0.517133    
## JRManager                                 0.015068 *  
## JRRD                                      0.018975 *  
## TotalWorkingYears                         0.464828    
## JobLevel:MonthlyRate                      0.202446    
## JobLevel:PerformanceRating                0.351539    
## JobLevel:BTRare                           0.928972    
## JobLevel:YearsSinceLastPromotion          0.047240 *  
## JobLevel:DepSales                         0.040371 *  
## JobLevel:JRMD                             0.164304    
## JobLevel:JRManager                        0.592538    
## JobLevel:JRRD                             0.844618    
## JobLevel:TotalWorkingYears                0.072816 .  
## MonthlyRate:PerformanceRating             0.726933    
## MonthlyRate:BTRare                        0.699729    
## MonthlyRate:YearsSinceLastPromotion       0.570875    
## MonthlyRate:DepSales                      0.710742    
## MonthlyRate:JRMD                          0.807937    
## MonthlyRate:JRManager                     0.472596    
## MonthlyRate:JRRD                          0.901531    
## MonthlyRate:TotalWorkingYears             0.001886 ** 
## PerformanceRating:BTRare                  0.810458    
## PerformanceRating:YearsSinceLastPromotion 0.683960    
## PerformanceRating:DepSales                0.262959    
## PerformanceRating:JRMD                    0.785207    
## PerformanceRating:JRManager               0.399524    
## PerformanceRating:JRRD                    0.637152    
## PerformanceRating:TotalWorkingYears       0.991730    
## BTRare:YearsSinceLastPromotion            0.720688    
## BTRare:DepSales                           0.868299    
## BTRare:JRMD                               0.618334    
## BTRare:JRManager                          0.847276    
## BTRare:JRRD                               0.673202    
## BTRare:TotalWorkingYears                  0.820022    
## YearsSinceLastPromotion:DepSales          0.864653    
## YearsSinceLastPromotion:JRMD              0.209899    
## YearsSinceLastPromotion:JRManager         0.925959    
## YearsSinceLastPromotion:JRRD              0.337156    
## YearsSinceLastPromotion:TotalWorkingYears 0.096025 .  
## DepSales:JRMD                                   NA    
## DepSales:JRManager                        0.000229 ***
## DepSales:JRRD                                   NA    
## DepSales:TotalWorkingYears                0.284234    
## JRMD:JRManager                                  NA    
## JRMD:JRRD                                       NA    
## JRMD:TotalWorkingYears                    0.118583    
## JRManager:JRRD                                  NA    
## JRManager:TotalWorkingYears               0.858004    
## JRRD:TotalWorkingYears                    0.615798    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1047 on 819 degrees of freedom
## Multiple R-squared:  0.9511, Adjusted R-squared:  0.9481 
## F-statistic: 318.7 on 50 and 819 DF,  p-value: < 2.2e-16

Now we wants to run the model again with just the statistical significant p-values: JobLevel, JRManager, JRRD, JobLevel:YearsSinceLastPromotion, JobLevel:DepSales, JobLevel:TotalWorkingYears, MonthlyRate:TotalWorkingYears,YearsSinceLastPromotion:TotalWorkingYears, DepSales:JRManager
Results: RSME of 1067

lmsalary_model4=lm(MonthlyIncome~JobLevel+JRManager+JRRD++JobLevel:TotalWorkingYears+MonthlyRate:TotalWorkingYears+YearsSinceLastPromotion:TotalWorkingYears+DepSales:JRManager+YearsSinceLastPromotion:BTRare,data=lm_salarydf)

summary(lmsalary_model4)
## 
## Call:
## lm(formula = MonthlyIncome ~ JobLevel + JRManager + JRRD + +JobLevel:TotalWorkingYears + 
##     MonthlyRate:TotalWorkingYears + YearsSinceLastPromotion:TotalWorkingYears + 
##     DepSales:JRManager + YearsSinceLastPromotion:BTRare, data = lm_salarydf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3785.1  -606.9  -106.4   663.1  4341.1 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                               -2.224e+02  1.130e+02  -1.968
## JobLevel                                   2.816e+03  7.950e+01  35.424
## JRManager                                  3.585e+03  2.415e+02  14.847
## JRRD                                       3.701e+03  1.898e+02  19.501
## JobLevel:TotalWorkingYears                 1.830e+01  3.204e+00   5.711
## TotalWorkingYears:MonthlyRate             -7.918e-04  3.420e-04  -2.315
## TotalWorkingYears:YearsSinceLastPromotion -7.530e-01  7.943e-01  -0.948
## JRManager:DepSales                        -3.999e+02  3.053e+02  -1.310
## YearsSinceLastPromotion:BTRare             5.293e+01  1.745e+01   3.034
##                                           Pr(>|t|)    
## (Intercept)                                0.04935 *  
## JobLevel                                   < 2e-16 ***
## JRManager                                  < 2e-16 ***
## JRRD                                       < 2e-16 ***
## JobLevel:TotalWorkingYears                1.55e-08 ***
## TotalWorkingYears:MonthlyRate              0.02083 *  
## TotalWorkingYears:YearsSinceLastPromotion  0.34340    
## JRManager:DepSales                         0.19059    
## YearsSinceLastPromotion:BTRare             0.00249 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1063 on 861 degrees of freedom
## Multiple R-squared:  0.947,  Adjusted R-squared:  0.9465 
## F-statistic:  1924 on 8 and 861 DF,  p-value: < 2.2e-16

Our last model we want is to do a full interaction on our dataset Results: RSME is 981

lmsalary_model5=lm(MonthlyIncome~(.)^2,data=lm_salarydf)

For my final model: I took the variables that were significant in our all interaction and added it to our data, and found that these were the ones to give us the best RSME = 1063.

lmsalary_model6=lm(MonthlyIncome~JobLevel+JRManager+JRRD++JobLevel:TotalWorkingYears+MonthlyRate:TotalWorkingYears+YearsSinceLastPromotion:TotalWorkingYears+DepSales:JRManager+YearsSinceLastPromotion:BTRare+EnvironmentSatisfaction:WorkLifeBalance,data=lm_salarydf)
summary(lmsalary_model6)
## 
## Call:
## lm(formula = MonthlyIncome ~ JobLevel + JRManager + JRRD + +JobLevel:TotalWorkingYears + 
##     MonthlyRate:TotalWorkingYears + YearsSinceLastPromotion:TotalWorkingYears + 
##     DepSales:JRManager + YearsSinceLastPromotion:BTRare + EnvironmentSatisfaction:WorkLifeBalance, 
##     data = lm_salarydf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3839.0  -609.7   -87.1   658.6  4323.8 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                               -1.434e+02  1.321e+02  -1.086
## JobLevel                                   2.818e+03  7.950e+01  35.447
## JRManager                                  3.584e+03  2.414e+02  14.844
## JRRD                                       3.690e+03  1.900e+02  19.424
## JobLevel:TotalWorkingYears                 1.830e+01  3.203e+00   5.713
## TotalWorkingYears:MonthlyRate             -7.857e-04  3.419e-04  -2.298
## TotalWorkingYears:YearsSinceLastPromotion -7.581e-01  7.942e-01  -0.955
## JRManager:DepSales                        -4.216e+02  3.058e+02  -1.378
## YearsSinceLastPromotion:BTRare             5.341e+01  1.745e+01   3.061
## EnvironmentSatisfaction:WorkLifeBalance   -1.100e+01  9.540e+00  -1.153
##                                           Pr(>|t|)    
## (Intercept)                                0.27797    
## JobLevel                                   < 2e-16 ***
## JRManager                                  < 2e-16 ***
## JRRD                                       < 2e-16 ***
## JobLevel:TotalWorkingYears                1.53e-08 ***
## TotalWorkingYears:MonthlyRate              0.02182 *  
## TotalWorkingYears:YearsSinceLastPromotion  0.34008    
## JRManager:DepSales                         0.16843    
## YearsSinceLastPromotion:BTRare             0.00227 ** 
## EnvironmentSatisfaction:WorkLifeBalance    0.24939    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1063 on 860 degrees of freedom
## Multiple R-squared:  0.9471, Adjusted R-squared:  0.9465 
## F-statistic:  1711 on 9 and 860 DF,  p-value: < 2.2e-16

Forward Selection Model

We used a forward selection model for all the statistically signficiant values without interactions, then ran it again with interactions for all the statistically signifidcant.

train1 = employeeData %>% select(c("Age","DailyRate","DistanceFromHome", "EnvironmentSatisfaction", "HourlyRate", "JobInvolvement", "JobLevel", "JobSatisfaction", "MonthlyIncome", "MonthlyRate", "NumCompaniesWorked", "OverTime", "PercentSalaryHike", "PerformanceRating", "RelationshipSatisfaction", "YearsAtCompany", "YearsInCurrentRole","YearsSinceLastPromotion", "YearsWithCurrManager", "JobRole", "BusinessTravel","Department","EducationField","Gender","JobRole","StockOptionLevel","TotalWorkingYears","TrainingTimesLastYear","WorkLifeBalance","Attrition"))
fit1 = lm(MonthlyIncome~.,data=train1)

fit2 = lm(MonthlyIncome~(JobLevel + JobRole + TotalWorkingYears + BusinessTravel + Gender + DailyRate + MonthlyRate + YearsWithCurrManager + YearsSinceLastPromotion + DistanceFromHome + PerformanceRating + PercentSalaryHike + Department)^2,data=train1)

fit3 = lm(MonthlyIncome~+ JobLevel + JobLevel:JobRole + JobLevel:BusinessTravel + JobLevel:Gender + JobLevel:PerformanceRating + JobLevel:Department + JobRole:TotalWorkingYears + JobRole:Department + DailyRate + DistanceFromHome + YearsWithCurrManager + PerformanceRating + TotalWorkingYears + MonthlyRate + PercentSalaryHike + Gender + YearsSinceLastPromotion + BusinessTravel + Department + TotalWorkingYears:MonthlyRate + JobRole + YearsSinceLastPromotion:DistanceFromHome + DailyRate:DistanceFromHome + JobRole:PercentSalaryHike + TotalWorkingYears:BusinessTravel + TotalWorkingYears:YearsSinceLastPromotion + JobLevel:YearsSinceLastPromotion + DailyRate:PerformanceRating + JobLevel:MonthlyRate + JobRole:Gender + TotalWorkingYears:DailyRate + YearsSinceLastPromotion:PerformanceRating + DailyRate:MonthlyRate + BusinessTravel:PercentSalaryHike + DailyRate:PercentSalaryHike,data=train1)

summary(fit3)
## 
## Call:
## lm(formula = MonthlyIncome ~ +JobLevel + JobLevel:JobRole + JobLevel:BusinessTravel + 
##     JobLevel:Gender + JobLevel:PerformanceRating + JobLevel:Department + 
##     JobRole:TotalWorkingYears + JobRole:Department + DailyRate + 
##     DistanceFromHome + YearsWithCurrManager + PerformanceRating + 
##     TotalWorkingYears + MonthlyRate + PercentSalaryHike + Gender + 
##     YearsSinceLastPromotion + BusinessTravel + Department + TotalWorkingYears:MonthlyRate + 
##     JobRole + YearsSinceLastPromotion:DistanceFromHome + DailyRate:DistanceFromHome + 
##     JobRole:PercentSalaryHike + TotalWorkingYears:BusinessTravel + 
##     TotalWorkingYears:YearsSinceLastPromotion + JobLevel:YearsSinceLastPromotion + 
##     DailyRate:PerformanceRating + JobLevel:MonthlyRate + JobRole:Gender + 
##     TotalWorkingYears:DailyRate + YearsSinceLastPromotion:PerformanceRating + 
##     DailyRate:MonthlyRate + BusinessTravel:PercentSalaryHike + 
##     DailyRate:PercentSalaryHike, data = train1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2871.3  -609.6   -82.1   536.7  3990.7 
## 
## Coefficients: (16 not defined because of singularities)
##                                                                  Estimate
## (Intercept)                                                     4.212e+03
## JobLevel                                                        1.125e+03
## DailyRate                                                      -1.446e+00
## DistanceFromHome                                               -1.975e+01
## YearsWithCurrManager                                           -1.668e+01
## PerformanceRating                                              -8.341e+02
## TotalWorkingYears                                               1.809e+02
## MonthlyRate                                                    -2.390e-03
## PercentSalaryHike                                               1.662e+02
## GenderMale                                                     -5.995e+01
## YearsSinceLastPromotion                                         1.793e+02
## BusinessTravelTravel_Frequently                                 1.201e+03
## BusinessTravelTravel_Rarely                                     8.172e+02
## DepartmentResearch & Development                               -5.200e+03
## DepartmentSales                                                -5.381e+03
## JobRoleHuman Resources                                         -2.973e+03
## JobRoleLaboratory Technician                                    2.970e+03
## JobRoleManager                                                  5.622e+03
## JobRoleManufacturing Director                                  -4.055e+02
## JobRoleResearch Director                                        5.669e+03
## JobRoleResearch Scientist                                       2.438e+03
## JobRoleSales Executive                                          1.303e+03
## JobRoleSales Representative                                     2.818e+03
## JobLevel:JobRoleHuman Resources                                 1.126e+03
## JobLevel:JobRoleLaboratory Technician                          -1.369e+03
## JobLevel:JobRoleManager                                         2.489e+01
## JobLevel:JobRoleManufacturing Director                          2.411e+02
## JobLevel:JobRoleResearch Director                               4.501e+01
## JobLevel:JobRoleResearch Scientist                             -4.400e+02
## JobLevel:JobRoleSales Executive                                 2.749e+02
## JobLevel:JobRoleSales Representative                           -7.996e+02
## JobLevel:BusinessTravelTravel_Frequently                        2.411e+02
## JobLevel:BusinessTravelTravel_Rarely                            1.130e+02
## JobLevel:GenderMale                                             1.807e+01
## JobLevel:PerformanceRating                                      8.632e+01
## JobLevel:DepartmentResearch & Development                       1.157e+03
## JobLevel:DepartmentSales                                        1.092e+03
## JobRoleHuman Resources:TotalWorkingYears                       -7.849e+01
## JobRoleLaboratory Technician:TotalWorkingYears                 -4.032e+01
## JobRoleManager:TotalWorkingYears                               -1.217e+01
## JobRoleManufacturing Director:TotalWorkingYears                 1.797e+01
## JobRoleResearch Director:TotalWorkingYears                     -3.436e+01
## JobRoleResearch Scientist:TotalWorkingYears                    -5.840e+01
## JobRoleSales Executive:TotalWorkingYears                       -1.224e-01
## JobRoleSales Representative:TotalWorkingYears                  -3.676e+01
## JobRoleHuman Resources:DepartmentResearch & Development                NA
## JobRoleLaboratory Technician:DepartmentResearch & Development          NA
## JobRoleManager:DepartmentResearch & Development                        NA
## JobRoleManufacturing Director:DepartmentResearch & Development         NA
## JobRoleResearch Director:DepartmentResearch & Development              NA
## JobRoleResearch Scientist:DepartmentResearch & Development             NA
## JobRoleSales Executive:DepartmentResearch & Development                NA
## JobRoleSales Representative:DepartmentResearch & Development           NA
## JobRoleHuman Resources:DepartmentSales                                 NA
## JobRoleLaboratory Technician:DepartmentSales                           NA
## JobRoleManager:DepartmentSales                                         NA
## JobRoleManufacturing Director:DepartmentSales                          NA
## JobRoleResearch Director:DepartmentSales                               NA
## JobRoleResearch Scientist:DepartmentSales                              NA
## JobRoleSales Executive:DepartmentSales                                 NA
## JobRoleSales Representative:DepartmentSales                            NA
## TotalWorkingYears:MonthlyRate                                  -3.083e-03
## DistanceFromHome:YearsSinceLastPromotion                       -2.072e+00
## DailyRate:DistanceFromHome                                      2.417e-02
## JobRoleHuman Resources:PercentSalaryHike                       -1.161e+02
## JobRoleLaboratory Technician:PercentSalaryHike                 -7.485e+01
## JobRoleManager:PercentSalaryHike                               -1.120e+02
## JobRoleManufacturing Director:PercentSalaryHike                -3.940e+01
## JobRoleResearch Director:PercentSalaryHike                     -8.998e+01
## JobRoleResearch Scientist:PercentSalaryHike                    -9.996e+01
## JobRoleSales Executive:PercentSalaryHike                       -1.118e+02
## JobRoleSales Representative:PercentSalaryHike                  -1.073e+02
## BusinessTravelTravel_Frequently:TotalWorkingYears              -6.151e+01
## BusinessTravelTravel_Rarely:TotalWorkingYears                  -3.264e+01
## TotalWorkingYears:YearsSinceLastPromotion                      -5.172e+00
## JobLevel:YearsSinceLastPromotion                                2.889e+01
## PerformanceRating:DailyRate                                     6.181e-01
## JobLevel:MonthlyRate                                            1.023e-02
## JobRoleHuman Resources:GenderMale                               7.485e+01
## JobRoleLaboratory Technician:GenderMale                        -5.864e+01
## JobRoleManager:GenderMale                                      -1.833e+02
## JobRoleManufacturing Director:GenderMale                        7.371e+02
## JobRoleResearch Director:GenderMale                            -1.328e+02
## JobRoleResearch Scientist:GenderMale                            5.964e+01
## JobRoleSales Executive:GenderMale                               2.202e+02
## JobRoleSales Representative:GenderMale                          4.940e+01
## TotalWorkingYears:DailyRate                                    -1.580e-02
## PerformanceRating:YearsSinceLastPromotion                      -4.257e+01
## DailyRate:MonthlyRate                                           1.409e-05
## BusinessTravelTravel_Frequently:PercentSalaryHike              -5.677e+01
## BusinessTravelTravel_Rarely:PercentSalaryHike                  -2.621e+01
## DailyRate:PercentSalaryHike                                    -4.098e-02
##                                                                Std. Error
## (Intercept)                                                     4.354e+03
## JobLevel                                                        1.020e+03
## DailyRate                                                       9.157e-01
## DistanceFromHome                                                1.013e+01
## YearsWithCurrManager                                            1.228e+01
## PerformanceRating                                               4.308e+02
## TotalWorkingYears                                               3.654e+01
## MonthlyRate                                                     1.453e-02
## PercentSalaryHike                                               5.496e+01
## GenderMale                                                      3.884e+02
## YearsSinceLastPromotion                                         1.168e+02
## BusinessTravelTravel_Frequently                                 6.671e+02
## BusinessTravelTravel_Rarely                                     5.911e+02
## DepartmentResearch & Development                                4.167e+03
## DepartmentSales                                                 4.221e+03
## JobRoleHuman Resources                                          4.341e+03
## JobRoleLaboratory Technician                                    8.494e+02
## JobRoleManager                                                  1.666e+03
## JobRoleManufacturing Director                                   9.313e+02
## JobRoleResearch Director                                        1.248e+03
## JobRoleResearch Scientist                                       8.553e+02
## JobRoleSales Executive                                          2.267e+03
## JobRoleSales Representative                                     2.382e+03
## JobLevel:JobRoleHuman Resources                                 9.892e+02
## JobLevel:JobRoleLaboratory Technician                           3.091e+02
## JobLevel:JobRoleManager                                         4.075e+02
## JobLevel:JobRoleManufacturing Director                          3.186e+02
## JobLevel:JobRoleResearch Director                               3.193e+02
## JobLevel:JobRoleResearch Scientist                              3.247e+02
## JobLevel:JobRoleSales Executive                                 5.589e+02
## JobLevel:JobRoleSales Representative                            8.314e+02
## JobLevel:BusinessTravelTravel_Frequently                        2.305e+02
## JobLevel:BusinessTravelTravel_Rarely                            2.032e+02
## JobLevel:GenderMale                                             1.273e+02
## JobLevel:PerformanceRating                                      1.399e+02
## JobLevel:DepartmentResearch & Development                       8.859e+02
## JobLevel:DepartmentSales                                        9.014e+02
## JobRoleHuman Resources:TotalWorkingYears                        7.256e+01
## JobRoleLaboratory Technician:TotalWorkingYears                  2.930e+01
## JobRoleManager:TotalWorkingYears                                3.517e+01
## JobRoleManufacturing Director:TotalWorkingYears                 3.017e+01
## JobRoleResearch Director:TotalWorkingYears                      3.289e+01
## JobRoleResearch Scientist:TotalWorkingYears                     2.898e+01
## JobRoleSales Executive:TotalWorkingYears                        2.648e+01
## JobRoleSales Representative:TotalWorkingYears                   3.820e+01
## JobRoleHuman Resources:DepartmentResearch & Development                NA
## JobRoleLaboratory Technician:DepartmentResearch & Development          NA
## JobRoleManager:DepartmentResearch & Development                        NA
## JobRoleManufacturing Director:DepartmentResearch & Development         NA
## JobRoleResearch Director:DepartmentResearch & Development              NA
## JobRoleResearch Scientist:DepartmentResearch & Development             NA
## JobRoleSales Executive:DepartmentResearch & Development                NA
## JobRoleSales Representative:DepartmentResearch & Development           NA
## JobRoleHuman Resources:DepartmentSales                                 NA
## JobRoleLaboratory Technician:DepartmentSales                           NA
## JobRoleManager:DepartmentSales                                         NA
## JobRoleManufacturing Director:DepartmentSales                          NA
## JobRoleResearch Director:DepartmentSales                               NA
## JobRoleResearch Scientist:DepartmentSales                              NA
## JobRoleSales Executive:DepartmentSales                                 NA
## JobRoleSales Representative:DepartmentSales                            NA
## TotalWorkingYears:MonthlyRate                                   1.054e-03
## DistanceFromHome:YearsSinceLastPromotion                        1.350e+00
## DailyRate:DistanceFromHome                                      1.053e-02
## JobRoleHuman Resources:PercentSalaryHike                        6.287e+01
## JobRoleLaboratory Technician:PercentSalaryHike                  3.964e+01
## JobRoleManager:PercentSalaryHike                                5.526e+01
## JobRoleManufacturing Director:PercentSalaryHike                 4.210e+01
## JobRoleResearch Director:PercentSalaryHike                      5.387e+01
## JobRoleResearch Scientist:PercentSalaryHike                     3.933e+01
## JobRoleSales Executive:PercentSalaryHike                        3.686e+01
## JobRoleSales Representative:PercentSalaryHike                   5.184e+01
## BusinessTravelTravel_Frequently:TotalWorkingYears               2.858e+01
## BusinessTravelTravel_Rarely:TotalWorkingYears                   2.505e+01
## TotalWorkingYears:YearsSinceLastPromotion                       1.925e+00
## JobLevel:YearsSinceLastPromotion                                1.424e+01
## PerformanceRating:DailyRate                                     3.844e-01
## JobLevel:MonthlyRate                                            7.377e-03
## JobRoleHuman Resources:GenderMale                               4.918e+02
## JobRoleLaboratory Technician:GenderMale                         3.248e+02
## JobRoleManager:GenderMale                                       4.411e+02
## JobRoleManufacturing Director:GenderMale                        3.138e+02
## JobRoleResearch Director:GenderMale                             4.197e+02
## JobRoleResearch Scientist:GenderMale                            3.214e+02
## JobRoleSales Executive:GenderMale                               2.745e+02
## JobRoleSales Representative:GenderMale                          4.010e+02
## TotalWorkingYears:DailyRate                                     1.167e-02
## PerformanceRating:YearsSinceLastPromotion                       3.475e+01
## DailyRate:MonthlyRate                                           1.215e-05
## BusinessTravelTravel_Frequently:PercentSalaryHike               3.714e+01
## BusinessTravelTravel_Rarely:PercentSalaryHike                   3.240e+01
## DailyRate:PercentSalaryHike                                     3.676e-02
##                                                                t value Pr(>|t|)
## (Intercept)                                                      0.967 0.333613
## JobLevel                                                         1.103 0.270318
## DailyRate                                                       -1.580 0.114593
## DistanceFromHome                                                -1.950 0.051579
## YearsWithCurrManager                                            -1.359 0.174575
## PerformanceRating                                               -1.936 0.053172
## TotalWorkingYears                                                4.950 9.05e-07
## MonthlyRate                                                     -0.164 0.869407
## PercentSalaryHike                                                3.023 0.002581
## GenderMale                                                      -0.154 0.877384
## YearsSinceLastPromotion                                          1.535 0.125217
## BusinessTravelTravel_Frequently                                  1.801 0.072132
## BusinessTravelTravel_Rarely                                      1.382 0.167225
## DepartmentResearch & Development                                -1.248 0.212482
## DepartmentSales                                                 -1.275 0.202735
## JobRoleHuman Resources                                          -0.685 0.493632
## JobRoleLaboratory Technician                                     3.496 0.000498
## JobRoleManager                                                   3.375 0.000773
## JobRoleManufacturing Director                                   -0.435 0.663395
## JobRoleResearch Director                                         4.544 6.38e-06
## JobRoleResearch Scientist                                        2.850 0.004484
## JobRoleSales Executive                                           0.575 0.565668
## JobRoleSales Representative                                      1.183 0.237193
## JobLevel:JobRoleHuman Resources                                  1.138 0.255394
## JobLevel:JobRoleLaboratory Technician                           -4.429 1.08e-05
## JobLevel:JobRoleManager                                          0.061 0.951305
## JobLevel:JobRoleManufacturing Director                           0.757 0.449310
## JobLevel:JobRoleResearch Director                                0.141 0.887941
## JobLevel:JobRoleResearch Scientist                              -1.355 0.175840
## JobLevel:JobRoleSales Executive                                  0.492 0.622948
## JobLevel:JobRoleSales Representative                            -0.962 0.336472
## JobLevel:BusinessTravelTravel_Frequently                         1.046 0.295988
## JobLevel:BusinessTravelTravel_Rarely                             0.556 0.578188
## JobLevel:GenderMale                                              0.142 0.887177
## JobLevel:PerformanceRating                                       0.617 0.537302
## JobLevel:DepartmentResearch & Development                        1.306 0.191958
## JobLevel:DepartmentSales                                         1.212 0.226033
## JobRoleHuman Resources:TotalWorkingYears                        -1.082 0.279730
## JobRoleLaboratory Technician:TotalWorkingYears                  -1.376 0.169200
## JobRoleManager:TotalWorkingYears                                -0.346 0.729389
## JobRoleManufacturing Director:TotalWorkingYears                  0.596 0.551672
## JobRoleResearch Director:TotalWorkingYears                      -1.045 0.296466
## JobRoleResearch Scientist:TotalWorkingYears                     -2.015 0.044250
## JobRoleSales Executive:TotalWorkingYears                        -0.005 0.996312
## JobRoleSales Representative:TotalWorkingYears                   -0.962 0.336163
## JobRoleHuman Resources:DepartmentResearch & Development             NA       NA
## JobRoleLaboratory Technician:DepartmentResearch & Development       NA       NA
## JobRoleManager:DepartmentResearch & Development                     NA       NA
## JobRoleManufacturing Director:DepartmentResearch & Development      NA       NA
## JobRoleResearch Director:DepartmentResearch & Development           NA       NA
## JobRoleResearch Scientist:DepartmentResearch & Development          NA       NA
## JobRoleSales Executive:DepartmentResearch & Development             NA       NA
## JobRoleSales Representative:DepartmentResearch & Development        NA       NA
## JobRoleHuman Resources:DepartmentSales                              NA       NA
## JobRoleLaboratory Technician:DepartmentSales                        NA       NA
## JobRoleManager:DepartmentSales                                      NA       NA
## JobRoleManufacturing Director:DepartmentSales                       NA       NA
## JobRoleResearch Director:DepartmentSales                            NA       NA
## JobRoleResearch Scientist:DepartmentSales                           NA       NA
## JobRoleSales Executive:DepartmentSales                              NA       NA
## JobRoleSales Representative:DepartmentSales                         NA       NA
## TotalWorkingYears:MonthlyRate                                   -2.925 0.003546
## DistanceFromHome:YearsSinceLastPromotion                        -1.535 0.125278
## DailyRate:DistanceFromHome                                       2.296 0.021945
## JobRoleHuman Resources:PercentSalaryHike                        -1.847 0.065072
## JobRoleLaboratory Technician:PercentSalaryHike                  -1.888 0.059401
## JobRoleManager:PercentSalaryHike                                -2.027 0.042984
## JobRoleManufacturing Director:PercentSalaryHike                 -0.936 0.349682
## JobRoleResearch Director:PercentSalaryHike                      -1.670 0.095242
## JobRoleResearch Scientist:PercentSalaryHike                     -2.541 0.011232
## JobRoleSales Executive:PercentSalaryHike                        -3.034 0.002489
## JobRoleSales Representative:PercentSalaryHike                   -2.070 0.038748
## BusinessTravelTravel_Frequently:TotalWorkingYears               -2.152 0.031680
## BusinessTravelTravel_Rarely:TotalWorkingYears                   -1.303 0.192991
## TotalWorkingYears:YearsSinceLastPromotion                       -2.686 0.007376
## JobLevel:YearsSinceLastPromotion                                 2.029 0.042763
## PerformanceRating:DailyRate                                      1.608 0.108259
## JobLevel:MonthlyRate                                             1.387 0.165785
## JobRoleHuman Resources:GenderMale                                0.152 0.879056
## JobRoleLaboratory Technician:GenderMale                         -0.181 0.856783
## JobRoleManager:GenderMale                                       -0.416 0.677856
## JobRoleManufacturing Director:GenderMale                         2.349 0.019080
## JobRoleResearch Director:GenderMale                             -0.316 0.751754
## JobRoleResearch Scientist:GenderMale                             0.186 0.852812
## JobRoleSales Executive:GenderMale                                0.802 0.422586
## JobRoleSales Representative:GenderMale                           0.123 0.902001
## TotalWorkingYears:DailyRate                                     -1.354 0.176180
## PerformanceRating:YearsSinceLastPromotion                       -1.225 0.220874
## DailyRate:MonthlyRate                                            1.159 0.246603
## BusinessTravelTravel_Frequently:PercentSalaryHike               -1.529 0.126772
## BusinessTravelTravel_Rarely:PercentSalaryHike                   -0.809 0.418727
## DailyRate:PercentSalaryHike                                     -1.115 0.265265
##                                                                   
## (Intercept)                                                       
## JobLevel                                                          
## DailyRate                                                         
## DistanceFromHome                                               .  
## YearsWithCurrManager                                              
## PerformanceRating                                              .  
## TotalWorkingYears                                              ***
## MonthlyRate                                                       
## PercentSalaryHike                                              ** 
## GenderMale                                                        
## YearsSinceLastPromotion                                           
## BusinessTravelTravel_Frequently                                .  
## BusinessTravelTravel_Rarely                                       
## DepartmentResearch & Development                                  
## DepartmentSales                                                   
## JobRoleHuman Resources                                            
## JobRoleLaboratory Technician                                   ***
## JobRoleManager                                                 ***
## JobRoleManufacturing Director                                     
## JobRoleResearch Director                                       ***
## JobRoleResearch Scientist                                      ** 
## JobRoleSales Executive                                            
## JobRoleSales Representative                                       
## JobLevel:JobRoleHuman Resources                                   
## JobLevel:JobRoleLaboratory Technician                          ***
## JobLevel:JobRoleManager                                           
## JobLevel:JobRoleManufacturing Director                            
## JobLevel:JobRoleResearch Director                                 
## JobLevel:JobRoleResearch Scientist                                
## JobLevel:JobRoleSales Executive                                   
## JobLevel:JobRoleSales Representative                              
## JobLevel:BusinessTravelTravel_Frequently                          
## JobLevel:BusinessTravelTravel_Rarely                              
## JobLevel:GenderMale                                               
## JobLevel:PerformanceRating                                        
## JobLevel:DepartmentResearch & Development                         
## JobLevel:DepartmentSales                                          
## JobRoleHuman Resources:TotalWorkingYears                          
## JobRoleLaboratory Technician:TotalWorkingYears                    
## JobRoleManager:TotalWorkingYears                                  
## JobRoleManufacturing Director:TotalWorkingYears                   
## JobRoleResearch Director:TotalWorkingYears                        
## JobRoleResearch Scientist:TotalWorkingYears                    *  
## JobRoleSales Executive:TotalWorkingYears                          
## JobRoleSales Representative:TotalWorkingYears                     
## JobRoleHuman Resources:DepartmentResearch & Development           
## JobRoleLaboratory Technician:DepartmentResearch & Development     
## JobRoleManager:DepartmentResearch & Development                   
## JobRoleManufacturing Director:DepartmentResearch & Development    
## JobRoleResearch Director:DepartmentResearch & Development         
## JobRoleResearch Scientist:DepartmentResearch & Development        
## JobRoleSales Executive:DepartmentResearch & Development           
## JobRoleSales Representative:DepartmentResearch & Development      
## JobRoleHuman Resources:DepartmentSales                            
## JobRoleLaboratory Technician:DepartmentSales                      
## JobRoleManager:DepartmentSales                                    
## JobRoleManufacturing Director:DepartmentSales                     
## JobRoleResearch Director:DepartmentSales                          
## JobRoleResearch Scientist:DepartmentSales                         
## JobRoleSales Executive:DepartmentSales                            
## JobRoleSales Representative:DepartmentSales                       
## TotalWorkingYears:MonthlyRate                                  ** 
## DistanceFromHome:YearsSinceLastPromotion                          
## DailyRate:DistanceFromHome                                     *  
## JobRoleHuman Resources:PercentSalaryHike                       .  
## JobRoleLaboratory Technician:PercentSalaryHike                 .  
## JobRoleManager:PercentSalaryHike                               *  
## JobRoleManufacturing Director:PercentSalaryHike                   
## JobRoleResearch Director:PercentSalaryHike                     .  
## JobRoleResearch Scientist:PercentSalaryHike                    *  
## JobRoleSales Executive:PercentSalaryHike                       ** 
## JobRoleSales Representative:PercentSalaryHike                  *  
## BusinessTravelTravel_Frequently:TotalWorkingYears              *  
## BusinessTravelTravel_Rarely:TotalWorkingYears                     
## TotalWorkingYears:YearsSinceLastPromotion                      ** 
## JobLevel:YearsSinceLastPromotion                               *  
## PerformanceRating:DailyRate                                       
## JobLevel:MonthlyRate                                              
## JobRoleHuman Resources:GenderMale                                 
## JobRoleLaboratory Technician:GenderMale                           
## JobRoleManager:GenderMale                                         
## JobRoleManufacturing Director:GenderMale                       *  
## JobRoleResearch Director:GenderMale                               
## JobRoleResearch Scientist:GenderMale                              
## JobRoleSales Executive:GenderMale                                 
## JobRoleSales Representative:GenderMale                            
## TotalWorkingYears:DailyRate                                       
## PerformanceRating:YearsSinceLastPromotion                         
## DailyRate:MonthlyRate                                             
## BusinessTravelTravel_Frequently:PercentSalaryHike                 
## BusinessTravelTravel_Rarely:PercentSalaryHike                     
## DailyRate:PercentSalaryHike                                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 970.6 on 794 degrees of freedom
## Multiple R-squared:  0.9593, Adjusted R-squared:  0.9554 
## F-statistic: 249.4 on 75 and 794 DF,  p-value: < 2.2e-16

###Conclusion of Salary Prediction Model The Multiple Linear Regression with Interaction terms is the best model with the best balance RSME of 1063.
The Best Predictors are JobLevel, JobRole, TotalWorkingYears, Business Travel, Gender, Daily Rate, Monthly RateYeraswithCurrManager, Years Since Last Promo, Distance From Home, Performance Rating Percent Salary Hike, and Department.

Ending Conclusion

Attrition Model: Naive Bayes
+ Naive Bayes with selected Predictors was better than the KNN model.
+ The Predictor Variables we use for Naives Makes Sense in how they were used in our Models to give us the best Accuracy, Sensitivity, and Specificity.
+ We might have some errors due to my own hand selection of models by putting on one variable at a time.
+ The best top three predictors of Attrition are JobLevel, Monthly Income, and Overtime

Salary Model: Multiple Linear Regression
+ Multiple Linear Regression using the selected Predictors from the interactions and overall regression provided statistical values that makes sense.
+ Interaction terms created powerful p-values we can use for our model.
+ The best level predictor of incomes are Job Level, Total Working Years, and Job Roles were our top three salary prediction predictors.