Getting Started with Machine Learning: My Day 1 with scikit-learn | by Surbi Karki | May, 2025


I began learning machine learning with the Python library scikit-learn. I am learning the DataCamp machine learning scientist with Python course now, and through interactive code and guided exercises, I understood the basic idea of supervised learning specifically focusing on classification problems.

My first implementation was a K-Nearest Neighbors(KNN) model, where I predicted customer churn based on features like account length and customer service calls. I gradually gained a complete understanding of scikit-learn’s utilities and real machine learning pipelines.

At this stage of my training, I learned fundamentals in classification with scikit-learn. I learned about binary classification, where we want to predict one of two options (e.g., churn or not churn). I learned to train and predict using the k-Nearest Neighbors (k-NN) model and how to divide data into test and training sets properly using train_test_split. I checked model performance using accuracy metrics and understood how model complexity influences them and the risks of overfitting and underfitting. The way accuracy changes with various values of k was visualized and assisted me in choosing the best model. I also used key performance measurement metrics like accuracy_score, .score(), and classification_report. All these skills were practiced in an interactive churn prediction task, where I utilized the whole supervised learning process.

Project: Customer Churn Classification

Using a real-world Telco Churn dataset, I applied KNN and logistic regression to

– Clean and preprocess data (handle missing values, feature encode)
– Select relevant features
– Train, tune, and evaluate classification models
– Model performance comparison and overfitting mitigation

Tools & Libraries Used

scikit-learn: Machine learning models and utilities
– pandas: Data manipulation
– NumPy: Numerical operations
– matplotlib/seaborn: Visualization
– Python: Programming logic and structure

Challenges Faced

  • Method use confusion in the start (fit() vs. predict() vs. score())
    – Interpreting error messages and debugging model shape issues
    – Choosing the correct evaluation metric depending on the task
    – Processing categorical data in preprocessing
    – Interpreting model complexity plot results

Learning Outcomes

– Developed hands-on experience with real-world datasets.
– Developed the skill to balance model complexity through visualization techniques and cross-validation.
– Improved understanding of data science pipelines through the implementation of scikit-learn.
– Developed confidence to explore more advanced algorithms (e.g., Decision Trees, Logistic Regression).

To support my learning, I referred to:

“Machine Learning with PyTorch and Scikit-Learn”

Sebastian Raschka, Yuxi Liu, Vahid Mirjalili — O’Reilly Media

This book provides modern, hands-on guidance for applying machine learning using both traditional scikit-learn and deep learning tools like PyTorch.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here