Bank Telemarketing NN & SVM Bank Telemarketing NN SVM | Nisha Selvarajan
Bank Telemarketing NN & SVM Bank Telemarketing NN SVM | Nisha Selvarajan

Neural Networks & SVM - Data Driven Approach To Predict Success Of Bank Telemarketing

Data-Driven Approach To Predict Success Of Bank Telemarketing

Nowadays marketing spending in the banking industry is massive, meaning it is essential for banks to optimize marketing strategies and improve effectiveness. Understanding customers’ need leads to more effective marketing plans, smarter product designs and greater customer satisfaction.The main objective of this project is to increase the effectiveness of the bank’s telemarketing campaign.This project will enable the bank to develop a more granular understanding of its customer base, predict customers’ response to its telemarketing campaign and establish a target customer profile for future marketing plans.

By analyzing customer features, such as demographics and transaction history, the bank will be able to predict customer saving behaviours and identify which type of customers is more likely to make term deposits. The bank can then focus its marketing efforts on those customers. This will not only allow the bank to secure deposits more effectively but also increase customer satisfaction by reducing undesirable advertisements for certain customers.

We are given the data of direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (target variable y). The goal here is to model the probability of buying, as a function of the customer features. You can find the code I used in my Github repo

Requirements

R==4.0.3
caret==6.0-86
gmodels==2.18.1
ggplot2==3.3.2

Data set

  • Data set which is utilized for this research has been taken from University of California, Irvine machine learning repository ( https://archive.ics.uci.edu/ml/datasets/Bank+Marketing?package=regsel&version=0.2) which is openly available for the public for research purpose. The dataset contains 41188 marketing campaigns observations with 20 input features. The details of 20 attributes are following.
    Names Description
    age Numeric - Age of the client
    job Categorical - Type of Job
    marital Categorical - Marital Status of Client
    education Categorical - Education qualification of client
    default Categorical - Has credit in default?
    housing Categorical - Has housing loan?
    loan Categorical - Has personal loan?
    contact Categorical - Contact like cellular,telephone
    month Categorical - Last Contact Month of Year
    day_of_week Categorical - Last Contact Day of the Week
    duration Numerical - Last Contact Duration in Seconds
    campaign Numerical - No of contacts performed for Campaign
    pdays Numerical - No of days passed after previous campaign contact.
    previous Numerical - No of contacts performed before this campaign for this client
    poutcome Categorical - Outcome of the previous marketing campaign
    emp.var.rate Numerical - Employment Variation Rate - quarterly indicator
    cons.price.idx Numerical - Consumer price index - monthly indicator
    cons.conf.idx Numerical - Consumer confidence index - monthly indicator
    euribor3m Numerical- Euribor 3 month rate - daily indicator
    nr.employed Numerical- No of employees - quarterly indicator
    y Categorical- Has the client subscribed a term deposit?

    Data Analysis

    • Plot missing values of all the features in the dataset.

    • Ploting histograms for numerical variables.

    • Metric table with many indicators for all numerical variables.
             variable         mean     std_dev variation_coef       p_01     p_05
    1             age   40.0240604  10.4212500    0.260374632   23.00000   26.000
    2        duration  258.2850102 259.2792488    1.003849386   11.00000   36.000
    3        campaign    2.5675925   2.7700135    1.078836903    1.00000    1.000
    4           pdays  962.4754540 186.9109073    0.194198103    3.00000  999.000
    5        previous    0.1729630   0.4949011    2.861311858    0.00000    0.000
    6    emp.var.rate    0.0818855   1.5709597   19.184834048   -3.40000   -2.900
    7  cons.price.idx   93.5756644   0.5788400    0.006185797   92.20100   92.713
    8   cons.conf.idx  -40.5026003   4.6281979   -0.114269154  -49.50000  -47.100
    9       euribor3m    3.6212908   1.7344474    0.478958331    0.65848    0.797
    10    nr.employed 5167.0359109  72.2515277    0.013983167 4963.60000 5017.500
           p_25     p_50     p_75     p_95     p_99   skewness  kurtosis     iqr
    1    32.000   38.000   47.000   58.000   71.000  0.7846682  3.791070  15.000
    2   102.000  180.000  319.000  752.650 1271.130  3.2630224 23.245334 217.000
    3     1.000    2.000    3.000    7.000   14.000  4.7623333 39.975160   2.000
    4   999.000  999.000  999.000  999.000  999.000 -4.9220107 25.226619   0.000
    5     0.000    0.000    0.000    1.000    2.000  3.8319027 23.106230   0.000
    6    -1.800    1.100    1.400    1.400    1.400 -0.7240692  1.937352   3.200
    7    93.075   93.749   93.994   94.465   94.465 -0.2308792  2.170146   0.919
    8   -42.700  -41.800  -36.400  -33.600  -26.900  0.3031688  2.641340   6.300
    9     1.344    4.857    4.961    4.966    4.968 -0.7091621  1.593222   3.617
    10 5099.100 5191.000 5228.100 5228.100 5228.100 -1.0442244  2.996094 129.000
               range_98         range_80
    1          [23, 71]         [28, 55]
    2     [11, 1271.13]        [59, 551]
    3           [1, 14]           [1, 5]
    4          [3, 999]       [999, 999]
    5            [0, 2]           [0, 1]
    6       [-3.4, 1.4]      [-1.8, 1.4]
    7  [92.201, 94.465] [92.893, 94.465]
    8    [-49.5, -26.9]   [-46.2, -36.1]
    9  [0.65848, 4.968]   [1.046, 4.964]
    10 [4963.6, 5228.1] [5076.2, 5228.1]

    Variable Importance & Crossplot to Deposit

    • Plot variable importance with several metrics such as entropy (en), mutual information(mi), information gain (ig) and gain ratio (gr).

    • Bivariate analysis crosss plot showing relationship of each and every variable with respect to target variable

    Prepare Data for Classification

    • Select variables relevant to customers:Based on the variable importance, we will use pdays, poutcome,previous, duration, cons.price.idx,cons.conf.idx,contact feature for further analysis.
    'data.frame':    41188 obs. of  8 variables:
     $ Term_Deposit                      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
     $ NumberOfDaysPassedAfterLastContact: num  999 999 999 999 999 999 999 999 999 999 ...
     $ PreviousMarketingOutCome          : num  2 2 2 2 2 2 2 2 2 2 ...
     $ NoOfContactsPerformed             : num  0 0 0 0 0 0 0 0 0 0 ...
     $ LastContactDuration               : num  261 149 226 151 307 198 139 217 380 50 ...
     $ ContactCommunicationType          : num  2 2 2 2 2 2 2 2 2 2 ...
     $ ConsumerPriceIndex                : num  -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 ...
     $ ConsumerConfidenceIndex           : num  94 94 94 94 94 ...
    • Load the cleaned dataset: - Convert categorical variable to numerical variable.
    • Data slicing:
      • Dataset is split into 80 percent of training data, 20 % of test set.
    • TrainingParameters :
      • train() method is passed with repeated cross-validation resampling method for 10 number of resampling iterations repeated for 3 times.

    Machine Learning: Classification using Neural Networks

    • Model Training
      • We can us neuralnet() to train a NN model. Also, the train() function from caret can help us tune parameters. We can plot the result to see which set of parameters is fit our data the best.
      • nnnet package by defualt uses the Logisitc Activation function.
      • Data Pre-Processing With Caret: The scale transform calculates the standard deviation for an attribute and divides each value by that standard deviation.
      • The center transform calculates the mean for an attribute and subtracts it from each value.
      • Combining the scale and center transforms will standardize your data.
      • Attributes will have a mean value of 0 and a standard deviation of 1.
      • Training transforms can prepared and applied automatically during model evaluation.
      • Transforms applied during training are prepared using the preProcess() and passed to the train() function via the preProcess argument.
      • Backpropagation algorithm is a supervised learning method for multilayer feed-forward networks from the field of Artificial Neural Networks.
      • The principle of the backpropagation approach is to model a given function by modifying internal weightings of input signals to produce an expected output signal. The system is trained using a supervised learning method, where the error between the system’s output and a known expected output is presented to the system and used to modify its internal state.
      • We use Backpropagation as algorithm in neural network package.
    nnetGrid <-  expand.grid(size = seq(from = 1, to = 5, by = 1)
                             ,decay = seq(from = 0.1, to = 0.2, by = 0.1)
                             )
    nn_model <- train(Term_Deposit ~ ., subTrain,
                      method = "nnet",  algorithm = 'backprop',     
                      trControl= TrainingParameters,
                      preProcess=c("scale","center"),
                      na.action = na.omit,
                      #metric = "ROC",
                      tuneGrid = nnetGrid,
                      trace=FALSE,
                      verbose=FALSE)      
    • Based on the caret neural network model, train sets hidden layer. Caret neural network picks the best neural network based on size & decay.We can visualize accuracy for different hidden layers below:
       size decay  Accuracy     Kappa  AccuracySD    KappaSD
    1     1   0.1 0.9040427 0.4358269 0.002567681 0.01507662
    2     1   0.2 0.9039820 0.4367773 0.002584615 0.01548641
    3     2   0.1 0.9051791 0.4418804 0.002210548 0.02086579
    4     2   0.2 0.9052600 0.4422005 0.002728089 0.01602451
    5     3   0.1 0.9055163 0.4370649 0.003408263 0.04122454
    6     3   0.2 0.9060626 0.4388124 0.003642564 0.04049697
    7     4   0.1 0.9074754 0.4514426 0.003252861 0.02746812
    8     4   0.2 0.9075294 0.4470578 0.002480409 0.03099499
    9     5   0.1 0.9078261 0.4603130 0.002673938 0.02354327
    10    5   0.2 0.9077991 0.4425523 0.004547698 0.07854141

    • Prediction
      • Now, our model is trained with accuracy = 0.8889 We are ready to predict classes for our test set.
              
    prediction  no yes
           no  284  25
           yes   8  16
    • Confusion matrix & Accuracy of Neural Network model:
    [1] 0.9009009
    Confusion Matrix and Statistics
    
              Reference
    Prediction  no yes
           no  284  25
           yes   8  16
                                              
                   Accuracy : 0.9009          
                     95% CI : (0.8636, 0.9308)
        No Information Rate : 0.8769          
        P-Value [Acc > NIR] : 0.103011        
                                              
                      Kappa : 0.4415          
                                              
     Mcnemar's Test P-Value : 0.005349        
                                              
                Sensitivity : 0.9726          
                Specificity : 0.3902          
             Pos Pred Value : 0.9191          
             Neg Pred Value : 0.6667          
                 Prevalence : 0.8769          
             Detection Rate : 0.8529          
       Detection Prevalence : 0.9279          
          Balanced Accuracy : 0.6814          
                                              
           'Positive' Class : no              
    
    • Plotting nnet variable importance
    nnet variable importance
    
                                       Overall
    ConsumerConfidenceIndex            100.000
    LastContactDuration                 67.780
    ContactCommunicationType            52.298
    NumberOfDaysPassedAfterLastContact  50.969
    ConsumerPriceIndex                  46.995
    PreviousMarketingOutCome             6.867
    NoOfContactsPerformed                0.000

    Machine Learning: Classification using SVM

    • SVM is another classification method that can be used to predict if a client falls into either ‘yes’ or ‘no’ class.
      • The linear, polynomial and RBF or Gaussian kernel in SVM are simply different in case of making the hyperplane decision boundary between the classes.
      • The kernel functions are used to map the original dataset (linear/nonlinear ) into a higher dimensional space with view to making it linear dataset.
      • Usually linear and polynomial kernels are less time consuming and provides less accuracy than the rbf or Gaussian kernels.
      • The k cross validation is used to divide the training set into k distinct subsets. Then every subset is used for training and others k-1 are used for validation in the entire trainging phase. This is done for the better training of the classification task.Overall, if you are unsure which kernel method would be best, a good practice is use of something like 10-fold cross-validation for each training set and then pick the best algorithm.

    SVM Classifier using Linear Kernel

    • Caret package provides train() method for training our data for various algorithms. We just need to pass different parameter values for different algorithms. Before train() method, we will first use trainControl() method.

      • We are setting 3 parameters of trainControl() method. The “method” parameter holds the details about resampling method. We can set “method” with many values like “boot”, “boot632”, “cv”, “repeatedcv”, “LOOCV”, “LGOCV” etc. For this project, let’s try to use repeatedcv i.e, repeated cross-validation.

      • The “number” parameter holds the number of resampling iterations. The “repeats” parameter contains the complete sets of folds to compute for our repeated cross-validation. We are using setting number =10 and repeats =3. This trainControl() methods returns a list. We are going to pass this on our train() method.

      • Before training our SVM classifier, set.seed().

      • For training SVM classifier, train() method should be passed with “method” parameter as “svmLinear”. We are passing our target variable Term_Deposit. The “Term_Deposit.~.” denotes a formula for using all attributes in our classifier and Term_Deposit. as the target variable. The “trControl” parameter should be passed with results from our trianControl() method. The “preProcess” parameter is for preprocessing our training data.

      • As discussed earlier for our data, preprocessing is a mandatory task. We are passing 2 values in our “preProcess” parameter “center” & “scale”. These two help for centering and scaling the data. After preProcessing these convert our training data with mean value as approximately “0” and standard deviation as “1”. The “tuneLength” parameter holds an integer value. This is for tuning our algorithm.

      </li>
    trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 2)
    set.seed(323)
    grid <- expand.grid(C = c( 0.25, 0.5, 1))
    svm_Linear_Grid <- train(Term_Deposit ~ ., data = subTrainSVM, method = "svmLinear", trControl=trctrl, preProcess = c("center", "scale"),
                             tuneGrid = grid,
                             tuneLength = 10)
    svm_Linear_Grid
    Support Vector Machines with Linear Kernel 
    
    9888 samples
       7 predictor
       2 classes: 'no', 'yes' 
    
    Pre-processing: centered (7), scaled (7) 
    Resampling: Cross-Validated (10 fold, repeated 2 times) 
    Summary of sample sizes: 8898, 8900, 8899, 8900, 8899, 8899, ... 
    Resampling results across tuning parameters:
    
      C     Accuracy   Kappa    
      0.25  0.8997786  0.2837249
      0.50  0.8997786  0.2837249
      1.00  0.8997786  0.2837249
    
    Accuracy was used to select the optimal model using the largest value.
    The final value used for the model was C = 0.25.
    • The above model is showing that our classifier is giving best accuracy on C = 0.25 Let’s try to make predictions using this model for our test set and check its accuracy.

    + Accuracy on the test set by train control is 89% using C=0.25.

    [1] 0.9166667
    • Final prediction accuracy on the test set is 0.9166667.

    SVM Classifier using Non-Linear Kernel

    • Now, we will try to build a model using Non-Linear Kernel like Radial Basis Function. For using RBF kernel, we just need to change our train() method’s “method” parameter to “svmRadial”. In Radial kernel, it needs to select proper value of Cost “C” parameter and “sigma” parameter.
    set.seed(323) 
    grid_radial <- expand.grid(sigma = c(0.25, 0.5,0.9),
     C = c(0.25, 0.5,1))
    svm_Radial <- train(Term_Deposit ~ ., data = subTrainSVM, method = "svmRadial",
    trControl=trctrl,
    preProcess = c("center", "scale"),tuneGrid = grid_radial,
    tuneLength = 10)
    
    svm_Radial
    Support Vector Machines with Radial Basis Function Kernel 
    
    9888 samples
       7 predictor
       2 classes: 'no', 'yes' 
    
    Pre-processing: centered (7), scaled (7) 
    Resampling: Cross-Validated (10 fold, repeated 2 times) 
    Summary of sample sizes: 8898, 8900, 8899, 8900, 8899, 8899, ... 
    Resampling results across tuning parameters:
    
      sigma  C     Accuracy   Kappa    
      0.25   0.25  0.9131796  0.4235344
      0.25   0.50  0.9128756  0.4232314
      0.25   1.00  0.9153028  0.4430922
      0.50   0.25  0.9163140  0.4454370
      0.50   0.50  0.9198031  0.4851975
      0.50   1.00  0.9230389  0.5121697
      0.90   0.25  0.9220279  0.4926883
      0.90   0.50  0.9236464  0.5192152
      0.90   1.00  0.9271349  0.5520005
    
    Accuracy was used to select the optimal model using the largest value.
    The final values used for the model were sigma = 0.9 and C = 1.
    • SVM-RBF kernel calculates variations and will present us best values of sigma & C. Based on the output best values of sigma= 0.9 & C=1 Let’s check our trained models’ accuracy on the test set.
    [1] 0.8333333
    • Final prediction accuracy on the test set is 0.8333333

    Comparision between SVM models

    • Comparision between SVM Linear and Radial Models.
    
    Call:
    summary.resamples(object = algo_results)
    
    Models: SVM_RADIAL, SVM_LINEAR 
    Number of resamples: 20 
    
    Accuracy 
                    Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
    SVM_RADIAL 0.9130435 0.9221436 0.9281750 0.9271349 0.9324393 0.9372470    0
    SVM_LINEAR 0.8917004 0.8935794 0.8993427 0.8997786 0.9035121 0.9129555    0
    
    Kappa 
                    Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
    SVM_RADIAL 0.4714582 0.5264890 0.5552111 0.5520005 0.5850553 0.6261048    0
    SVM_LINEAR 0.1811203 0.2504363 0.2763777 0.2837249 0.3243298 0.3717218    0

    Conclusion

    From the above implementation, the results are impressive and convincing in terms of using a machine learning algorithm to decide on the marketing campaign of the bank. Majority of the attributes in the dataset contribute significantly to the building of a predictive model. All the three ML approach acheives good accuracy rate(>85%) and are easier to implement.