EDA on Telecom Churn Data

The objectives of this project are:-
1. Perform exploratory analysis and extract insights from the dataset.
2. Split the dataset into train/test sets and explain your reasoning.
3. Build a predictive model to predict which customers are going to churn and discuss the reason why you choose a particular algorithm.
4. Establish metrics to evaluate model performance.
5. Discuss the potential issues with deploying the model into production

Import the required libraries

# python version # 3.8.2
import pandas as pd 
import numpy as np 
import os 
from pandas_profiling import ProfileReport
import warnings
warnings.filterwarnings('ignore')
# option to display all columns
pd.set_option('display.max_columns', None)
# Read the data
telecom_churn = pd.read_csv('../data/telecom_data/telecom.csv')
telecom_churn.head(10)
state account length area code phone number international plan voice mail plan number vmail messages total day minutes total day calls total day charge total eve minutes total eve calls total eve charge total night minutes total night calls total night charge total intl minutes total intl calls total intl charge customer service calls churn
0 KS 128 415 382-4657 no yes 25 265.1 110 45.07 197.4 99 16.78 244.7 91 11.01 10.0 3 2.70 1 False
1 OH 107 415 371-7191 no yes 26 161.6 123 27.47 195.5 103 16.62 254.4 103 11.45 13.7 3 3.70 1 False
2 NJ 137 415 358-1921 no no 0 243.4 114 41.38 121.2 110 10.30 162.6 104 7.32 12.2 5 3.29 0 False
3 OH 84 408 375-9999 yes no 0 299.4 71 50.90 61.9 88 5.26 196.9 89 8.86 6.6 7 1.78 2 False
4 OK 75 415 330-6626 yes no 0 166.7 113 28.34 148.3 122 12.61 186.9 121 8.41 10.1 3 2.73 3 False
5 AL 118 510 391-8027 yes no 0 223.4 98 37.98 220.6 101 18.75 203.9 118 9.18 6.3 6 1.70 0 False
6 MA 121 510 355-9993 no yes 24 218.2 88 37.09 348.5 108 29.62 212.6 118 9.57 7.5 7 2.03 3 False
7 MO 147 415 329-9001 yes no 0 157.0 79 26.69 103.1 94 8.76 211.8 96 9.53 7.1 6 1.92 0 False
8 LA 117 408 335-4719 no no 0 184.5 97 31.37 351.6 80 29.89 215.8 90 9.71 8.7 4 2.35 1 False
9 WV 141 415 330-8173 yes yes 37 258.6 84 43.96 222.0 111 18.87 326.4 97 14.69 11.2 5 3.02 0 False

Check the Shape and Column types of the Dataframe

telecom_churn.shape
(3333, 21)
telecom_churn.dtypes
state                      object
account length              int64
area code                   int64
phone number               object
international plan         object
voice mail plan            object
number vmail messages       int64
total day minutes         float64
total day calls             int64
total day charge          float64
total eve minutes         float64
total eve calls             int64
total eve charge          float64
total night minutes       float64
total night calls           int64
total night charge        float64
total intl minutes        float64
total intl calls            int64
total intl charge         float64
customer service calls      int64
churn                        bool
dtype: object

Exploratory Analysis

# Format the column names, remove space and special characters in column names
telecom_churn.columns =  telecom_churn.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
telecom_churn
state account_length area_code phone_number international_plan voice_mail_plan number_vmail_messages total_day_minutes total_day_calls total_day_charge total_eve_minutes total_eve_calls total_eve_charge total_night_minutes total_night_calls total_night_charge total_intl_minutes total_intl_calls total_intl_charge customer_service_calls churn
0 KS 128 415 382-4657 no yes 25 265.1 110 45.07 197.4 99 16.78 244.7 91 11.01 10.0 3 2.70 1 False
1 OH 107 415 371-7191 no yes 26 161.6 123 27.47 195.5 103 16.62 254.4 103 11.45 13.7 3 3.70 1 False
2 NJ 137 415 358-1921 no no 0 243.4 114 41.38 121.2 110 10.30 162.6 104 7.32 12.2 5 3.29 0 False
3 OH 84 408 375-9999 yes no 0 299.4 71 50.90 61.9 88 5.26 196.9 89 8.86 6.6 7 1.78 2 False
4 OK 75 415 330-6626 yes no 0 166.7 113 28.34 148.3 122 12.61 186.9 121 8.41 10.1 3 2.73 3 False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3328 AZ 192 415 414-4276 no yes 36 156.2 77 26.55 215.5 126 18.32 279.1 83 12.56 9.9 6 2.67 2 False
3329 WV 68 415 370-3271 no no 0 231.1 57 39.29 153.4 55 13.04 191.3 123 8.61 9.6 4 2.59 3 False
3330 RI 28 510 328-8230 no no 0 180.8 109 30.74 288.8 58 24.55 191.9 91 8.64 14.1 6 3.81 2 False
3331 CT 184 510 364-6381 yes no 0 213.8 105 36.35 159.6 84 13.57 139.2 137 6.26 5.0 10 1.35 2 False
3332 TN 74 415 400-4344 no yes 25 234.4 113 39.85 265.9 82 22.60 241.4 77 10.86 13.7 4 3.70 0 False

3333 rows × 21 columns

profile = ProfileReport(telecom_churn, title = "Telecom Churn Report")
profile.to_notebook_iframe()