Image by Author | Created on Canva
I started teaching myself data science in 2019 when I started grad school. Six months later, I landed a research internship with a machine learning company. So I should have done something right, yes? Well, not really.
When I started learning data science, I made quite a few mistakes. Some were necessary growing pains, but others could have been avoided if I had a clearer roadmap.
If I had the chance to start over in 2025, I’d follow a much more structured and intentional path. Here’s exactly what I would do.
Note: With the likes of ChatGPT and Claude AI, learning anything can be made more interesting and effective. You can use such AI tools to come up with learning schedules, simplify complex topics, debug errors, brainstorm project ideas, and more. But I suggest you do this according to your personal preferences and in whichever steps you see fit.
1. Start with the Basics: Programming First, Data Science Later
In my early days, I jumped straight into machine learning as I was quite comfortable with Python—without becoming proficient in both SQL and Python. Big mistake (no surprise there!).
If I could start fresh, I’d spend the first few months focusing on Python and SQL skills. Yes, also become familiar with Python libraries for data analysis. Remember, a strong foundation in programming simplifies everything else. You’ll write cleaner code and debug more effectively.
What to learn:
- Programming fundamentals with Python and SQL
- NumPy and pandas for data manipulation
Recommended resources from KDnuggets library:
Don’t rush this phase. Spend time solving problems on platforms like HackerRank or LeetCode to reinforce your skills.
2. Build Math Skills Concurrently
Mathematics often feels intimidating, but it’s essential for data science. I’d make it a priority to develop math skills alongside programming, breaking it down into two key areas:
Linear Algebra and Calculus: Focus on matrix operations, vector spaces, derivatives, and optimization. These are the building blocks of machine learning algorithms.
Statistics and Probability: Understand distributions, hypothesis testing, Bayes’ theorem, and the central limit theorem. These are crucial for making sense of data.
Recommended resources from KDnuggets library:
Practice applying these concepts. Solve as many math problems using Python—with libraries like NumPy and SciPy to reinforce your understanding.
3. Focus on Data Wrangling Early
A huge part of data science isn’t super fun—it’s cleaning messy data. I wish I’d spent more time becoming comfortable with data cleaning—instead of looking at it as a task to get out of my way.
What to learn:
Hands-on practice:
- Work with real-world datasets from Kaggle or government open data portals
- Spin up sample datasets from scratch and analyze them
Document your data cleaning process and the different approaches you’d taken to make the data ready for analysis.
4. Dive into Machine Learning with Context
When I started learning machine learning, I approached it backward—focusing on using Python libraries like scikit-learn and TensorFlow—without first analyzing the problem I’m trying to solve and the various approaches to solve it.
Like, I was too focused on building the machine learning model and I did not consider if a non-ML solution would have sufficed, too.
So, understand the problem before building an ML model. Also try to learn the math behind algorithms like linear regression, logistic regression, and decision trees.
- Start with scikit-learn for simple implementation
- Gradually explore TensorFlow or PyTorch for deep learning
Starr by working on small projects like predicting housing prices, classifying images, or clustering customer data.
Focus on interpretability. Understand why your model performs well (or poorly) instead of just chasing high accuracy.
5. Focus on Data Visualization and Storytelling
Great insights mean little if you can’t communicate them effectively. Visualizing data and crafting compelling stories are essential skills I’d prioritize.
- Static visualizations: Become comfortable with Matplotlib and Seaborn
- Interactive visualizations: Learn Plotly or Tableau
Turn raw numbers and results from analysis into stories that resonate with technical and non-technical audiences alike.
Practice presenting your findings to someone unfamiliar with data science. If they understand it, you’ve nailed the explanation.
6. Work on Real Projects Early
I wasted (quite a bit of) time passively consuming content instead of building real projects. Real-world projects are where you truly learn.
So just start coding already. Work on a handful of projects across areas:
- Simple regression and classification problems
- Working with text and image datasets
- Time series data
- Recommendation systems for books or movies
Showcase your creativity, problem-solving skills, and technical expertise. Publish your projects on GitHub and write blog posts explaining your approach. It’ll help you stand out in the job market when you’re actively looking for internships and full-time roles.
7. Stay Updated and Network
In data science, staying relevant is as important as building skills. I’d dedicate time to staying informed and connecting with the community.
To stay updated:
- Follow high-quality technical blogs and keep updated on the recent advances
- Engage in online communities (LinkedIn, Reddit, Slack, or Discord)
To network (beyond LinkedIn):
- Attend webinars, meetups, and conferences that interest you
- Reach out to professionals (say local chapters) for discussions and informational interviews.
Share your learning journey on platforms like LinkedIn. Sharing what you learned as blog posts can also help you build a personal brand.
Wrapping Up
In summary: if I could start over in 2025, this would be my roadmap: a mix of foundational learning, hands-on projects, and continuous engagement with the data science community.
The journey isn’t linear, and there’s no shortcut, but with a structured approach, you can avoid common pitfalls and make steady progress.
Keep learning and growing!
Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.