An Open Source Database for Eclipse Chasers | by Rohit Pandey | Apr, 2024

Towards Data Science

11 min read

10 hours ago

Image created with midjourney

At the risk of stating the obvious, the biggest weakness of a data scientist is that they can’t practice their craft without high quality data. And creating a high quality dataset isn’t exactly trivial. This becomes the most obvious blocker to adding any kind of value via this discipline. Unlike engineering where you can roll up your sleeves and start building on day one, a data scientist can’t do much without first having the data.

In a big to medium sized organization, this problem is typically addressed by investing in data engineering first, getting the data flowing so that data scientists can then work on top of it and bring their skills to bear. An important feature of these data sets is that they are not static, but animate. As the business churns, data keeps flowing into the datasets, making them animate and evolving. The data science products built on top of them can then also evolve. This becomes a positive feedback loop, where once people see the value the data science products bring, it drives further investment in data engineering and collecting even richer data which in turn enables more powerful data science applications and so on.

While this story repeats many times over behind the closed doors of various organizations, I haven’t seen it unfold in the realm of…

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here