Neural Networks & SVM - Data Driven Approach To Predict Success Of Bank Telemarketing
Data-Driven Approach To Predict Success Of Bank Telemarketing
Nowadays marketing spending in the banking industry is massive, meaning it is essential for banks to optimize marketing strategies and improve effectiveness. Understanding customers’ need leads to more effective marketing plans, smarter product designs and greater customer satisfaction.The main objective of this project is to increase the effectiveness of the bank’s telemarketing campaign.This project will enable the bank to develop a more granular understanding of its customer base, predict customers’ response to its telemarketing campaign and establish a target customer profile for future marketing plans.
By analyzing customer features, such as demographics and transaction history, the bank will be able to predict customer saving behaviours and identify which type of customers is more likely to make term deposits. The bank can then focus its marketing efforts on those customers. This will not only allow the bank to secure deposits more effectively but also increase customer satisfaction by reducing undesirable advertisements for certain customers.
We are given the data of direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (target variable y). The goal here is to model the probability of buying, as a function of the customer features. You can find the code I used in my Github repo
Requirements
Data set
Names | Description |
---|---|
age | Numeric - Age of the client |
job | Categorical - Type of Job |
marital | Categorical - Marital Status of Client |
education | Categorical - Education qualification of client |
default | Categorical - Has credit in default? |
housing | Categorical - Has housing loan? |
loan | Categorical - Has personal loan? |
contact | Categorical - Contact like cellular,telephone |
month | Categorical - Last Contact Month of Year |
day_of_week | Categorical - Last Contact Day of the Week |
duration | Numerical - Last Contact Duration in Seconds |
campaign | Numerical - No of contacts performed for Campaign |
pdays | Numerical - No of days passed after previous campaign contact. |
previous | Numerical - No of contacts performed before this campaign for this client |
poutcome | Categorical - Outcome of the previous marketing campaign |
emp.var.rate | Numerical - Employment Variation Rate - quarterly indicator |
cons.price.idx | Numerical - Consumer price index - monthly indicator |
cons.conf.idx | Numerical - Consumer confidence index - monthly indicator |
euribor3m | Numerical- Euribor 3 month rate - daily indicator |
nr.employed | Numerical- No of employees - quarterly indicator |
y | Categorical- Has the client subscribed a term deposit? |
Data Analysis
- Plot missing values of all the features in the dataset.
- Ploting histograms for numerical variables.
- Metric table with many indicators for all numerical variables.
variable mean std_dev variation_coef p_01 p_05
1 age 40.0240604 10.4212500 0.260374632 23.00000 26.000
2 duration 258.2850102 259.2792488 1.003849386 11.00000 36.000
3 campaign 2.5675925 2.7700135 1.078836903 1.00000 1.000
4 pdays 962.4754540 186.9109073 0.194198103 3.00000 999.000
5 previous 0.1729630 0.4949011 2.861311858 0.00000 0.000
6 emp.var.rate 0.0818855 1.5709597 19.184834048 -3.40000 -2.900
7 cons.price.idx 93.5756644 0.5788400 0.006185797 92.20100 92.713
8 cons.conf.idx -40.5026003 4.6281979 -0.114269154 -49.50000 -47.100
9 euribor3m 3.6212908 1.7344474 0.478958331 0.65848 0.797
10 nr.employed 5167.0359109 72.2515277 0.013983167 4963.60000 5017.500
p_25 p_50 p_75 p_95 p_99 skewness kurtosis iqr
1 32.000 38.000 47.000 58.000 71.000 0.7846682 3.791070 15.000
2 102.000 180.000 319.000 752.650 1271.130 3.2630224 23.245334 217.000
3 1.000 2.000 3.000 7.000 14.000 4.7623333 39.975160 2.000
4 999.000 999.000 999.000 999.000 999.000 -4.9220107 25.226619 0.000
5 0.000 0.000 0.000 1.000 2.000 3.8319027 23.106230 0.000
6 -1.800 1.100 1.400 1.400 1.400 -0.7240692 1.937352 3.200
7 93.075 93.749 93.994 94.465 94.465 -0.2308792 2.170146 0.919
8 -42.700 -41.800 -36.400 -33.600 -26.900 0.3031688 2.641340 6.300
9 1.344 4.857 4.961 4.966 4.968 -0.7091621 1.593222 3.617
10 5099.100 5191.000 5228.100 5228.100 5228.100 -1.0442244 2.996094 129.000
range_98 range_80
1 [23, 71] [28, 55]
2 [11, 1271.13] [59, 551]
3 [1, 14] [1, 5]
4 [3, 999] [999, 999]
5 [0, 2] [0, 1]
6 [-3.4, 1.4] [-1.8, 1.4]
7 [92.201, 94.465] [92.893, 94.465]
8 [-49.5, -26.9] [-46.2, -36.1]
9 [0.65848, 4.968] [1.046, 4.964]
10 [4963.6, 5228.1] [5076.2, 5228.1]
Variable Importance & Crossplot to Deposit
- Plot variable importance with several metrics such as entropy (en), mutual information(mi), information gain (ig) and gain ratio (gr).
- Bivariate analysis crosss plot showing relationship of each and every variable with respect to target variable
Prepare Data for Classification
- Select variables relevant to customers:Based on the variable importance, we will use pdays, poutcome,previous, duration, cons.price.idx,cons.conf.idx,contact feature for further analysis.
'data.frame': 41188 obs. of 8 variables:
$ Term_Deposit : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ NumberOfDaysPassedAfterLastContact: num 999 999 999 999 999 999 999 999 999 999 ...
$ PreviousMarketingOutCome : num 2 2 2 2 2 2 2 2 2 2 ...
$ NoOfContactsPerformed : num 0 0 0 0 0 0 0 0 0 0 ...
$ LastContactDuration : num 261 149 226 151 307 198 139 217 380 50 ...
$ ContactCommunicationType : num 2 2 2 2 2 2 2 2 2 2 ...
$ ConsumerPriceIndex : num -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 ...
$ ConsumerConfidenceIndex : num 94 94 94 94 94 ...
- Load the cleaned dataset: - Convert categorical variable to numerical variable.
- Data slicing:
- Dataset is split into 80 percent of training data, 20 % of test set.
- TrainingParameters :
- train() method is passed with repeated cross-validation resampling method for 10 number of resampling iterations repeated for 3 times.
Machine Learning: Classification using Neural Networks
- Model Training
- We can us neuralnet() to train a NN model. Also, the train() function from caret can help us tune parameters. We can plot the result to see which set of parameters is fit our data the best.
- nnnet package by defualt uses the Logisitc Activation function.
- Data Pre-Processing With Caret: The scale transform calculates the standard deviation for an attribute and divides each value by that standard deviation.
- The center transform calculates the mean for an attribute and subtracts it from each value.
- Combining the scale and center transforms will standardize your data.
- Attributes will have a mean value of 0 and a standard deviation of 1.
- Training transforms can prepared and applied automatically during model evaluation.
- Transforms applied during training are prepared using the preProcess() and passed to the train() function via the preProcess argument.
- Backpropagation algorithm is a supervised learning method for multilayer feed-forward networks from the field of Artificial Neural Networks.
- The principle of the backpropagation approach is to model a given function by modifying internal weightings of input signals to produce an expected output signal. The system is trained using a supervised learning method, where the error between the system’s output and a known expected output is presented to the system and used to modify its internal state.
- We use Backpropagation as algorithm in neural network package.
nnetGrid <- expand.grid(size = seq(from = 1, to = 5, by = 1)
,decay = seq(from = 0.1, to = 0.2, by = 0.1)
)
nn_model <- train(Term_Deposit ~ ., subTrain,
method = "nnet", algorithm = 'backprop',
trControl= TrainingParameters,
preProcess=c("scale","center"),
na.action = na.omit,
#metric = "ROC",
tuneGrid = nnetGrid,
trace=FALSE,
verbose=FALSE)
- Based on the caret neural network model, train sets hidden layer. Caret neural network picks the best neural network based on size & decay.We can visualize accuracy for different hidden layers below:
size decay Accuracy Kappa AccuracySD KappaSD
1 1 0.1 0.9040427 0.4358269 0.002567681 0.01507662
2 1 0.2 0.9039820 0.4367773 0.002584615 0.01548641
3 2 0.1 0.9051791 0.4418804 0.002210548 0.02086579
4 2 0.2 0.9052600 0.4422005 0.002728089 0.01602451
5 3 0.1 0.9055163 0.4370649 0.003408263 0.04122454
6 3 0.2 0.9060626 0.4388124 0.003642564 0.04049697
7 4 0.1 0.9074754 0.4514426 0.003252861 0.02746812
8 4 0.2 0.9075294 0.4470578 0.002480409 0.03099499
9 5 0.1 0.9078261 0.4603130 0.002673938 0.02354327
10 5 0.2 0.9077991 0.4425523 0.004547698 0.07854141
- Prediction
- Now, our model is trained with accuracy = 0.8889 We are ready to predict classes for our test set.
prediction no yes
no 284 25
yes 8 16
- Confusion matrix & Accuracy of Neural Network model:
[1] 0.9009009
Confusion Matrix and Statistics
Reference
Prediction no yes
no 284 25
yes 8 16
Accuracy : 0.9009
95% CI : (0.8636, 0.9308)
No Information Rate : 0.8769
P-Value [Acc > NIR] : 0.103011
Kappa : 0.4415
Mcnemar's Test P-Value : 0.005349
Sensitivity : 0.9726
Specificity : 0.3902
Pos Pred Value : 0.9191
Neg Pred Value : 0.6667
Prevalence : 0.8769
Detection Rate : 0.8529
Detection Prevalence : 0.9279
Balanced Accuracy : 0.6814
'Positive' Class : no
- Plotting nnet variable importance
nnet variable importance
Overall
ConsumerConfidenceIndex 100.000
LastContactDuration 67.780
ContactCommunicationType 52.298
NumberOfDaysPassedAfterLastContact 50.969
ConsumerPriceIndex 46.995
PreviousMarketingOutCome 6.867
NoOfContactsPerformed 0.000
Machine Learning: Classification using SVM
- SVM is another classification method that can be used to predict if a client falls into either ‘yes’ or ‘no’ class.
- The linear, polynomial and RBF or Gaussian kernel in SVM are simply different in case of making the hyperplane decision boundary between the classes.
- The kernel functions are used to map the original dataset (linear/nonlinear ) into a higher dimensional space with view to making it linear dataset.
- Usually linear and polynomial kernels are less time consuming and provides less accuracy than the rbf or Gaussian kernels.
- The k cross validation is used to divide the training set into k distinct subsets. Then every subset is used for training and others k-1 are used for validation in the entire trainging phase. This is done for the better training of the classification task.Overall, if you are unsure which kernel method would be best, a good practice is use of something like 10-fold cross-validation for each training set and then pick the best algorithm.
SVM Classifier using Linear Kernel
Caret package provides train() method for training our data for various algorithms. We just need to pass different parameter values for different algorithms. Before train() method, we will first use trainControl() method.
We are setting 3 parameters of trainControl() method. The “method” parameter holds the details about resampling method. We can set “method” with many values like “boot”, “boot632”, “cv”, “repeatedcv”, “LOOCV”, “LGOCV” etc. For this project, let’s try to use repeatedcv i.e, repeated cross-validation.
The “number” parameter holds the number of resampling iterations. The “repeats” parameter contains the complete sets of folds to compute for our repeated cross-validation. We are using setting number =10 and repeats =3. This trainControl() methods returns a list. We are going to pass this on our train() method.
Before training our SVM classifier, set.seed().
For training SVM classifier, train() method should be passed with “method” parameter as “svmLinear”. We are passing our target variable Term_Deposit. The “Term_Deposit.~.” denotes a formula for using all attributes in our classifier and Term_Deposit. as the target variable. The “trControl” parameter should be passed with results from our trianControl() method. The “preProcess” parameter is for preprocessing our training data.
As discussed earlier for our data, preprocessing is a mandatory task. We are passing 2 values in our “preProcess” parameter “center” & “scale”. These two help for centering and scaling the data. After preProcessing these convert our training data with mean value as approximately “0” and standard deviation as “1”. The “tuneLength” parameter holds an integer value. This is for tuning our algorithm.
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 2)
set.seed(323)
grid <- expand.grid(C = c( 0.25, 0.5, 1))
svm_Linear_Grid <- train(Term_Deposit ~ ., data = subTrainSVM, method = "svmLinear", trControl=trctrl, preProcess = c("center", "scale"),
tuneGrid = grid,
tuneLength = 10)
svm_Linear_Grid
Support Vector Machines with Linear Kernel
9888 samples
7 predictor
2 classes: 'no', 'yes'
Pre-processing: centered (7), scaled (7)
Resampling: Cross-Validated (10 fold, repeated 2 times)
Summary of sample sizes: 8898, 8900, 8899, 8900, 8899, 8899, ...
Resampling results across tuning parameters:
C Accuracy Kappa
0.25 0.8997786 0.2837249
0.50 0.8997786 0.2837249
1.00 0.8997786 0.2837249
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was C = 0.25.
- The above model is showing that our classifier is giving best accuracy on C = 0.25 Let’s try to make predictions using this model for our test set and check its accuracy.
+ Accuracy on the test set by train control is 89% using C=0.25.
[1] 0.9166667
- Final prediction accuracy on the test set is 0.9166667.
SVM Classifier using Non-Linear Kernel
- Now, we will try to build a model using Non-Linear Kernel like Radial Basis Function. For using RBF kernel, we just need to change our train() method’s “method” parameter to “svmRadial”. In Radial kernel, it needs to select proper value of Cost “C” parameter and “sigma” parameter.
set.seed(323)
grid_radial <- expand.grid(sigma = c(0.25, 0.5,0.9),
C = c(0.25, 0.5,1))
svm_Radial <- train(Term_Deposit ~ ., data = subTrainSVM, method = "svmRadial",
trControl=trctrl,
preProcess = c("center", "scale"),tuneGrid = grid_radial,
tuneLength = 10)
svm_Radial
Support Vector Machines with Radial Basis Function Kernel
9888 samples
7 predictor
2 classes: 'no', 'yes'
Pre-processing: centered (7), scaled (7)
Resampling: Cross-Validated (10 fold, repeated 2 times)
Summary of sample sizes: 8898, 8900, 8899, 8900, 8899, 8899, ...
Resampling results across tuning parameters:
sigma C Accuracy Kappa
0.25 0.25 0.9131796 0.4235344
0.25 0.50 0.9128756 0.4232314
0.25 1.00 0.9153028 0.4430922
0.50 0.25 0.9163140 0.4454370
0.50 0.50 0.9198031 0.4851975
0.50 1.00 0.9230389 0.5121697
0.90 0.25 0.9220279 0.4926883
0.90 0.50 0.9236464 0.5192152
0.90 1.00 0.9271349 0.5520005
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.9 and C = 1.
- SVM-RBF kernel calculates variations and will present us best values of sigma & C. Based on the output best values of sigma= 0.9 & C=1 Let’s check our trained models’ accuracy on the test set.
[1] 0.8333333
- Final prediction accuracy on the test set is 0.8333333
Comparision between SVM models
- Comparision between SVM Linear and Radial Models.
Call:
summary.resamples(object = algo_results)
Models: SVM_RADIAL, SVM_LINEAR
Number of resamples: 20
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
SVM_RADIAL 0.9130435 0.9221436 0.9281750 0.9271349 0.9324393 0.9372470 0
SVM_LINEAR 0.8917004 0.8935794 0.8993427 0.8997786 0.9035121 0.9129555 0
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
SVM_RADIAL 0.4714582 0.5264890 0.5552111 0.5520005 0.5850553 0.6261048 0
SVM_LINEAR 0.1811203 0.2504363 0.2763777 0.2837249 0.3243298 0.3717218 0
Conclusion
From the above implementation, the results are impressive and convincing in terms of using a machine learning algorithm to decide on the marketing campaign of the bank. Majority of the attributes in the dataset contribute significantly to the building of a predictive model. All the three ML approach acheives good accuracy rate(>85%) and are easier to implement.