We will use a synthetic dataset for this exercise. The dataset contains the following columns:
- CustomerID: A unique identifier for each customer.
- Age: The age of the customer.
- MonthlyCharge: The monthly bill amount for the customer.
- CustomerServiceCalls: The number of times the customer contacted customer service.
- Churn: The target variable, indicating whether the customer churned (Yes) or not (No).
UNSupervised Learning Code
Below is the Python code to set up and execute the unsupervised learning task:
import pandas as pd
import matplotlib.pyplot as plt
import warnings
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import treewarnings.filterwarnings('ignore')
we create a synthetic dataset:
data = {
'CustomerID': range(1, 101),
'Age': [20, 25, 30, 35, 40, 45, 50, 55, 60, 65]*10,
'MonthlyCharge': [50, 60, 70, 80, 90, 100, 110, 120, 130, 140]*10,
'CustomerServiceCalls': [1, 2, 3, 4, 0, 1, 2, 3, 4, 0]*10,
'Churn': ['No', 'No', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'Yes']*10
}
df = pd.DataFrame(data)
We split the data into features (X) and the target variable (y):
X = df[['Age', 'MonthlyCharge', 'CustomerServiceCalls']]
y = df['Churn']
we further split the dataset into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
We use Scikit-learn to create and train a DecisionTreeClassifier:
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
We make predictions on the test set and calculate the accuracy of the model:
y_pred = clf.predict(X_test)accuracy = accuracy_score(y_test, y_pred)
print(f'Model Accuracy: {accuracy}')
Using Matplotlib, we visualize how the decision tree makes decisions:
plt.figure(figsize=(12,8))
tree.plot_tree(clf, filled=True, feature_names=['Age', 'MonthlyCharge', 'CustomerServiceCalls'], class_names=['No Churn', 'Churn'])
plt.title('Decision Tree for Predicting Customer Churn')
plt.show()
Model Accuracy
The accuracy score gives us an idea of how well our model performs. In our case, the synthetic dataset may not reflect real-world complexity, so the accuracy might vary.
Decision Tree Interpretation
The decision tree visualization helps us understand the rules used by the model to make predictions. For example, it might show that customers with a high number of service calls and high monthly charges are more likely to churn.
- Gini: A measure of impurity. Lower values indicate higher purity.
- Samples: The number of samples reaching the node.
- Value: The distribution of samples in different classes at the node.
- Class: The predicted class at the node.
Results-
In this exercise, we built a decision tree model to predict customer churn for a telecom company using AWS SageMaker. We generated synthetic data, trained the model, evaluated its performance, and visualized the decision tree.
Final Steps
After completing the exercise, remember to delete the notebook instance from AWS SageMaker to avoid unnecessary charges.
Summary
By following the steps outlined in this blog post, you have gained hands-on experience in building and evaluating a decision tree model for churn prediction. This approach provides actionable insights that can help businesses retain customers and improve their services.
For further details, you can check out the complete project documentation, demonstration video, and the source code available on GitHub.
🎥 Watch the demonstration video: https://youtu.be/fNhWejM7EqY
📂 Check out the GitHub documentation: https://github.com/Pratik-Khose/AWS-Machine-Mearning-projects
Let’s connect and discuss more about decision trees, customer churn prediction, and their applications in various industries. Always eager to learn and collaborate on innovative projects! 🌟
#MachineLearning #DecisionTree #CustomerChurn #AWS #SageMaker #DataScience #Telecom #PredictiveModeling