Image by Author | Canva
Having a clear picture of how to launch your data science career was always important. Now, with the job market cooling off, it’s even more so. Is it worth it? Data science still promises high salaries and an interesting career. But finding a job has been harder in the last couple of years. Especially for beginners, as it’s often hard to know where to start.
To help you, I’ll provide a step-by-step roadmap.
What Is Data Science?
Data science is a field that uses data to extract insights. It typically does so through statistical techniques and, where appropriate, machine learning (ML) models.
Who Can Become a Data Scientist?
You don’t need a special education level or field of study, such as a computer science degree.
However, if you enjoy solving problems, working with data, dissecting numbers, and presenting the insights, you’ll enjoy data science much more than your ordinary Joe.
In addition, as data science is an ever-evolving field, you will have to continuously learn to stay competitive.
Step-by-Step Roadmap to Begin
Here is our roadmap.

Image by Author
Step #1: Learn the Fundamentals
Due to data science’s blend of different disciplines, you’ll have to have a solid knowledge of many fields.

Image by Author
1. Learn a Programming Language
You need a programming language for virtually every stage of a data science workflow: pulling data, cleaning and analyzing it, building ML models, creating visualizations, and automating reporting.
While R is popular, especially in academia, Python is the industry standard. It’s a very flexible programming language used for basically every data science task. There are many libraries that significantly extend Python’s built-in capabilities.
What to Learn:
- Basics: variables, loops, functions, conditionals, functions, error handling
- Data structures: lists, dictionaries, arrays
- Libraries:
- pandas – for data manipulation
- NumPy – for numerical computing
Resources:
- Python: freeCodeCamp, Codecademy, StrataScratch, DataCamp, Python Data Science Handbook
- R: swirl, R Studio Education, Codecademy, StrataScratch, DataCamp, R for Data Science, Hands-On Programming With R
2. Understand the Math and Stats Behind the Models
You don’t necessarily need a maths degree, but you should have a strong foundation in mathematics and statistics. This will help you understand how the machine learning models work, what they can do, and what they can’t. With that, you’ll be able to choose the right model for a particular problem and interpret results accurately.
What to Learn:
- Descriptive statistics: mean, median, mode, standard deviation, and percentiles – for summarizing and exploring datasets
- Probability theory and distributions: normal, binomial, Poisson, and uniform distributions – for understanding uncertainty and variability in data
- Hypothesis testing and confidence intervals: p-values, t-tests, z-tests – for A/B testing and interpreting model performance
- Linear algebra and calculus basics: vectors, matrices, dot products, derivatives, gradients – for understanding algorithms
Resources: Khan Academy, StatQuest, Brilliant.org, Mathematics for Machine Learning
3. Get Fluent in SQL and Data Wrangling
You’ll work with databases, and SQL is a language designed for data retrieval. For data wrangling – dealing with missing values, inconsistent formats, and duplicates – you’ll mostly use Python or R.
What to Learn:
- SELECT, WHERE, GROUP BY, HAVING, and JOIN – for retrieving and combining data
- Subqueries and Common Table Expressions (CTEs) – for complex, modular queries
- Aggregate functions and window functions – for data summarization
- Data wrangling skills: handling missing values, data type conversions, feature engineering, merging, and reshaping datasets in pandas
Resources: SQLBolt, Mode SQL, Khan Academy, StrataScratch, pandas official documentation, Real Python
4. Learn and Apply Machine Learning Techniques
Machine learning enables systems to learn patterns from data and make predictions or decisions without being explicitly programmed for every scenario. Start simple. The most important thing is that you understand what problems ML can solve and how to apply algorithms effectively.
What to Learn:
- A must-know – scikit-learn (for building and testing models)
Resources: Machine Learning by Andrew Ng, Machine Learning Crash Course, Machine Learning Mastery, StatQuest, scikit-learn documentation
5. Understand the Role of AI
Artificial intelligence (AI) has become an essential data science skill in recent years. While not every job necessarily requires you to build large-scale models yourself, it’s now practically a standard requirement to use AI APIs, prompting large language models (LLMs), or incorporating them into ML pipelines.
What to Learn:
- Deep learning basics: neural networks, backpropagation, activation functions
- LLM application in data science
- Tools: OpenAI API, Anthropic Claude, Google Gemini API, Mistral AI (LLMs and APIs), LangChain, LlamaIndex, Haystack (frameworks), Hugging Face, Replicate, NVIDIA NGC (model hubs)
- Prompt engineering: summarization, classification, code generation
Resources: ChatGPT Prompt Engineering for Developers, HuggingFace Courses, Google’s Generative AI Learning Path, FastAI Practical Deep Learning, OpenAI API Docs
6. Visualize Data and Communicate Results
You must be able to visualize data so that your insights are understandable to people without a technical background.
What to Learn:
- Chart types: bar, line, scatter, histogram, box plots
- Design principles: choosing the chart type, limiting the number of elements, color use, labeling, and Tufte’s principles
- Storytelling with data: creating a narrative, posing a question, using annotations, ordering charts logically, commenting on visuals, and explaining the impact
- Tools:
- BI platforms – Tableau or Power BI (dashboards and business reporting; interactivity optionally)
Resources: Python Plotting With Matplotlib, seaborn tutorial, Plotly documentation, DataCamp, Data Visualization With Python by IBM, Storytelling with Data, Fundamentals of Data Visualisation
7. Build Domain Knowledge and Business Thinking
Data science isn’t about writing code and training models in a vacuum – it’s about solving business problems. So, you must be able to connect your technical work with business outcomes and communicate your insights in ways that matter to stakeholders.
What to Learn:
- Key performance indicators (KPIs) in different industries
- Defining clear problem statements from vague business objectives
- Asking the right questions before analysing data
- Communicating insights clearly
- Particularities of a specific industry
Resources:
Step #2: Use Your Skills in Practice
It’s crucial that you can demonstrate to potential employers you know how to solve real-world problems using your technical skills.

Image by Author
1. Create a Portfolio
With a portfolio, you can demonstrate the ability to work with real data solving real problems end-to-end, and communicate the solutions. This is as close to a real job as you can get.
What to Include in Each Project:
- A short business context
- Your data cleaning process
- Exploratory data analysis (EDA)
- Final results
- Code repository (GitHub)
- Blog post (optionally)
Tools:
Resources:
- Projects: StrataScratch, DataWars, thecleverprogrammer, 22 Machine Learning Projects
- Datasets: Kaggle, UCI Machine Learning Repository, Data.gov, Google Dataset Search, Awesome Public Datasets, World Bank Open Data, Inside Airbnb, Yelp Open Dataset
2. Get Experience (Even Without a Job)
This will help you bridge the gap between theory and practice. The employers don’t exactly care where you learned something. They care more about how you’ve used it. The following options offer opportunities for gaining actual experience:
Step #3: Apply for Jobs
No need to wait until you’ve “mastered everything” because that’s an impossible job. No one knows “all” data science, so don’t let it delay you from starting on your data science career path. Applying for jobs as you go makes you understand how hiring works, build interview experience, and get feedback you can use in further learning.
Apply for:
- Data analyst jobs – if you’re still learning ML
- Entry-level data scientist jobs – if you’re confident with end-to-end projects
Conclusion
Breaking into data science is doable with consistent, focused effort. Start by building core skills, practising with real data, and documenting your projects. You don’t need to learn everything at once—just start.
You’ll be surprised how far you can go in a few focused months.
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.