Getting Started with Machine Learning: My Day 1 with scikit-learn | by Surbi Karki | May, 2025

I began learning machine learning with the Python library scikit-learn. I am learning the DataCamp machine learning scientist with Python course now, and through interactive code and guided exercises, I understood the basic idea of supervised learning specifically focusing on classification problems.

My first implementation was a K-Nearest Neighbors(KNN) model, where I predicted customer churn based on features like account length and customer service calls. I gradually gained a complete understanding of scikit-learn’s utilities and real machine learning pipelines.

At this stage of my training, I learned fundamentals in classification with scikit-learn. I learned about binary classification, where we want to predict one of two options (e.g., churn or not churn). I learned to train and predict using the k-Nearest Neighbors (k-NN) model and how to divide data into test and training sets properly using train_test_split. I checked model performance using accuracy metrics and understood how model complexity influences them and the risks of overfitting and underfitting. The way accuracy changes with various values of k was visualized and assisted me in choosing the best model. I also used key performance measurement metrics like accuracy_score, .score(), and classification_report. All these skills were practiced in an interactive churn prediction task, where I utilized the whole supervised learning process.

Project: Customer Churn Classification

Using a real-world Telco Churn dataset, I applied KNN and logistic regression to

– Clean and preprocess data (handle missing values, feature encode)
– Select relevant features
– Train, tune, and evaluate classification models
– Model performance comparison and overfitting mitigation

Tools & Libraries Used

scikit-learn: Machine learning models and utilities
– pandas: Data manipulation
– NumPy: Numerical operations
– matplotlib/seaborn: Visualization
– Python: Programming logic and structure

Challenges Faced

Method use confusion in the start (fit() vs. predict() vs. score())
– Interpreting error messages and debugging model shape issues
– Choosing the correct evaluation metric depending on the task
– Processing categorical data in preprocessing
– Interpreting model complexity plot results

Learning Outcomes

– Developed hands-on experience with real-world datasets.
– Developed the skill to balance model complexity through visualization techniques and cross-validation.
– Improved understanding of data science pipelines through the implementation of scikit-learn.
– Developed confidence to explore more advanced algorithms (e.g., Decision Trees, Logistic Regression).

To support my learning, I referred to:

“Machine Learning with PyTorch and Scikit-Learn”

Sebastian Raschka, Yuxi Liu, Vahid Mirjalili — O’Reilly Media

This book provides modern, hands-on guidance for applying machine learning using both traditional scikit-learn and deep learning tools like PyTorch.

Getting Started with Machine Learning: My Day 1 with scikit-learn | by Surbi Karki | May, 2025

Recent Articles

13 Best Soundbars We’ve Tested and Reviewed (2025): Sonos, Sony, Bose

How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework

How to establish an effective AI GRC framework

Automate Dataset Labeling with Active Learning

Researchers Expose New Intel CPU Flaws Enabling Memory Leaks and Spectre v2 Attacks

Related Stories

Leave A Reply Cancel reply