Is Data Science dead? The Rebuke. Addressing some of the hot topics in… | by Rosaria Silipo | Low Code for Data Science | Jun, 2024


EDITOR’S PICK | AI & DATA SCIENCE | DEEP DIVES

Photo by Joshua Sortino on Unsplash

My previous article “Is Data Science dead?” caused quite a stir.

The title was provocative, but the question is surfacing in the head of many data scientists. It was worth addressing it. The article generated quite a number of comments and reactions. If I had to summarize the nature of the comments, I could come up with the following topics:

  • Data science is more than coding.
  • How will the job market change?
  • Is AI future-proof?
  • Are data scientists future-proof?
  • Open-source as a competitive advantage.

All of them had a valid point, which I would like to discuss here.

Most of the comments stated that data science is not just coding, that coding actually makes up perhaps 5–10% of a data scientist work. “The rest of a data scientist work is taken up with defining the problem to be solved, identifying a suitable methodology for solving the problem, creating high level solution designs and breaking that down to components, gathering data and assuring it, a bit of coding, uncertainty estimation, verification and validation.”, according to David Plummer’s response.

Indeed, data science is more than coding. It also needs knowledge, design, analytics, and communication skills for a successful project. If anybody had doubt, it has become clear now after the introduction of AI. Did AI accelerate this acceptance process or has it always been clear to everybody? In my opinion, AI did make it evident that Python coding is not the only thing that data scientists do.

As a colleague of mine puts it, data science requires (since now or since ever) “more thinking and less tinkering”, in this case coding. Data scientists should indeed focus on how the data flows from different sources through a series of transformation operations and analysis modules, all the way to writing back data, exporting a report, or deploying a model.

So, what is left for a data scientist to do? And will this affect the job market?

While AI will make our job faster, results need to be verified. Presentation of incorrect unacceptable AI generated results are common experience. As we speak, methodologies and libraries are being developed to double check AI outputs. However, so far, the common feeling is that output checking and result interpretation are the sole responsibility of expert data scientists.

Another area that will be blooming, as a consequence of the introduction of AI, is data engineering. AI models need data, lots of data — organized, structured, clean data. Thus, data engineering skills will grow in importance to satisfy this new need of the job market.

A third group of comments addressed complex data science applications. Currently, AI can easily recreate simple data science applications. It cannot design all the steps of a more complex solution. While AI will improve, and it will raise the complexity bar a bit higher, I doubt that it will ever be able to create new complex solutions. Complex data science applications will still need expert data scientists.

In conclusions, data professionals will still be needed for data engineering, prompt engineering, security, output checking, and more complex application design.

Another group of comments relates to AI and the future. Currently, AI relies on all the knowledge available on the web and posted over many years by many smart users. However, if this knowledge changes, how long will AI take to adapt? It will need new documents and new examples and will need to wait till such new documents and examples are published.

While this question is being addressed as we speak (for example: new and advanced flavors of RAG and fine-tuning are being proposed every day to inject custom knowledge into the models), there are still questions about the ethics of AI and how objective AI answers are that are yet to be solved.

So, how future-proof is AI?

The same question should be asked about data scientists. Are data scientists future-proof? What college students learn in their data science courses will be enough to exploit and control this new AI trend in data science?

Some respondents lamented that by just feeding prompts in AI, data scientists will not learn the fundamentals of data science. How can we expect them to control the data science process then?

It is a legitimate question. Though, maybe, it underestimates the capability of students and juniors to adapt. I do see a lot of conceptual mistakes even now, when working with students. However, new knowledge is created and absorbed just by correcting these mistakes. I am confident that with a good teaching of the fundamentals in data science courses, a proper training can be achieved even with AI around.

This last note I found quite insightful. A comment underlines the pivoting role of the open-source strategy in the success of AI models.

Open-source code earlier on, and open-source models now seem to have a competitive advantage in comparison to other strategies, for access to cutting edge algorithms and for speed of development.

Finally a Thank You note to all readers who have responded to my article with stimulating insights, knowledgeable comment, and interesting observations. I have learned, meditated, and in some case refined my thoughts on the topic.

You can find it the whole article “Is Data Science dead?” and its comments on the Medium journal “Low Code for Data Science”.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here