Becoming a Data Scientist: What I Wish I Knew Before Starting


Breaking into data science: The Good, the Bad, and the Python Bugs

Towards Data Science

10 min read

13 hours ago

Photo by Markus Spiske on Unsplash

Martin Luther King Jr. is famous for his speech, I Have a Dream. He delivered it at the Lincoln Memorial in Washington, D.C., on August 28, 1963, in front of approximately 250,000 persons. It’s considered one of the most important speeches of the 20th century. It played a crucial role in the civil rights movement for Black Americans.

During this speech, he said that he dreamed of a day when his four children would live in a nation where people will not be judged by the color of their skin but by the content of their character.

I also had a dream several years ago. It was not as glorious or reshaped the course of history as Martin Luther King’s. I aspired to become a data scientist.

It wasn’t for the prestige or because it was trendy (and still is) but because I genuinely love working with data, solving complex problems, and leveraging insights to drive business results. Becoming a data scientist was where my unique skills and passions met. You know, that sweet spot that leads to a fulfilling career.

My journey wasn’t straightforward. I didn’t know where to start, nor did I know what to do next. I took various courses, many of which turned out to be unhelpful. I also read countless articles about data science. While becoming a data scientist requires hard work, I spent a lot of effort on things that ultimately weren’t necessary.

I wish someone had given me the guidance I’m about to share with you. This is the purpose of this article. The good news? Following these steps won’t guarantee a job as a data scientist, but they will significantly improve your chances… even without a PhD! I know several professionals who have excelled as data scientists without doctorates. Success in this field is mainly about persistence and practical experience.

“The beginning is the most important part of any work.”

— Plato

Research shows that a toddler takes about 14,000 steps and experiences 100 falls per day over 2–3 months before mastering walking. Yet, they persist, never considering giving up.

In contrast, as adults, we often do the opposite. We tend to abandon as soon as we encounter obstacles. Where an adult might see 100 failures, a baby sees 100 learning opportunities. The baby doesn’t overanalyze its failure or overcalculate the risks. It simply starts, tries, falls, and tries again!

Consider the story of Justin Kan, the co-founder of Twitch. His entrepreneurial journey didn’t start with a blockbuster success. It began with what he called a “shitty first startup” named Kiko, an online calendar app. Kiko was competing against giants like Google Calendar, but it was eventually sold on eBay for $258,100!

Next, he launched Justin.tv, a platform where he live-streamed his life 24/7. Justin.tv eventually became Twitch, a live-streaming platform focused on gaming. In 2014, Amazon acquired Twitch for $970 million!

As Justin Kan stated, “Don’t wait. Go build your first shitty startup now.”

This advice applies to your journey into data science as well. Start somewhere. Begin your learning process now. Even if your first attempt feels “shitty” and you’re unsure of where to start, it’s okay. You can build upon your initial efforts, and nothing prevents you from adjusting your direction as you progress. You need to start now and somewhere.

Photo by Vlad Bagacian on Unsplash

The Cathedral of Beauvais in France was intended to be the tallest cathedral in the world during the 13th century. Its ambitious design pushed the limits of Gothic architecture. However, one notable collapse occurred in 1284 when the choir vault fell due to insufficient foundations and structural support. It remains unfinished to this day.

This serves as a strong analogy for your journey into data science. You may be tempted (we all are) to dive directly into the exciting parts, such as deep learning models, LLMs, or the latest machine learning frameworks. But like the Cathedral of Beauvais, your ambitious plan could fail without a solid foundation. Learning the basics first is crucial to ensure your knowledge is robust enough to support more advanced concepts.

Mathematics: Your Universal Language

Think of mathematics as the language of patterns. There is mathematics everywhere. And honestly, if you don’t like mathematics, perhaps a career in data science isn’t the right choice for you.

You don’t need to become a mathematician, but you do need to understand the following key concepts :

  • Linear algebra (matrices, vectors, etc.): Think of matrices and vectors as the language in which data communicates. Understanding these concepts allows you to manipulate data structures for machine learning algorithms.
  • Calculus (differentiation, integration, gradient, etc.): They are essential for optimizing models, like gradients in training neural networks.
  • Statistics (distributions, descriptive statistics, etc.): This is where you learn to interpret the stories data tells. Understanding concepts like distributions and descriptive statistics allows you to make informed decisions based on patterns in data.

Diving into Programming

With your mathematical foundation in place, programming will bring your ideas to life. While some will argue to learn R in data science, Python stands out for its versatility and widespread use in the industry. Furthermore, most people I know use Python. It will be more than good enough for most use cases. Focus on:

  • Basic syntax and functions: understand how Python works at a fundamental level. It’s like learning an alphabet before writing stories.
  • Data structures: lists, dictionaries, tuples — know how to use them. It’s crucial for handling real-world data.
  • Control flow statements: master “if statements,” “for loops,” and “while loops.” These allow you to implement logic that can solve complex problems. With simple statements, you can accomplish much more than you think!
  • Object-oriented programming: understand the concept of classes, functions, and objects. This allows you to write efficient, reusable code. It also facilitates collaboration with others.

SQL: Your Database Language

Data is often stored in databases that you need to access and manipulate. SQL is your language to interact with this data.

  • Interacting with databases: Learn basic SQL commands to retrieve, update, and manage data.

Machine Learning: Turning Data into Insights

Next, you can move on to machine learning after understanding mathematics, programming, and data handling. Focus on:

  • Understanding algorithms: start by learning algorithms like linear regression, decision trees, and clustering methods. These are the basics for more complex models.
  • Supervised vs unsupervised learning: understand the difference between these two core types of machine learning. Supervised learning involves training models with labeled data, whereas unsupervised learning involves unlabeled data.
  • Model evaluation: Learn how to assess the performance of your models using metrics like F1 score for classification models, word error rate for speech recognition, or RMSE for time-series analysis.
  • Feature engineering: It’s the art of transforming your raw data so your models can understand it. Often, this makes more of a difference than using a fancy algorithm. You can see an example here.
  • Libraries and frameworks: Familiarize yourself with popular Python libraries for machine learning, such as scikit-learn, TensorFlow, and PyTorch.

Remember, machine learning is not just about applying algorithms. It’s about understanding the problem you’re trying to solve and choosing the right approach.

Business Sense: Turning Technical Skill into Business Impact

Many people contact me about starting a career in data science. They typically have impressive qualifications, such as Ph.D.s and a strong background in mathematics. However, even with these impressive credentials, many struggle to break into the field. The reason? They lack business sense.

Technical skills are essential. However, here’s the truth. The best AI model will have a 0$ value if it doesn’t solve a business problem. I’ve seen brilliant data scientists fail because they built sophisticated models that no one used. The key? Learn to think like a business owner.

For instance:

  • Translating business problems: Instead of just building a predictive model, you should ask, “How does this model support decision-making within the business?”
  • Prioritizing impact: Focus on problems where data science can generate the most value rather than pursuing complex solutions that don’t solve a business problem.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here