An Open Source Database for Eclipse Chasers | by Rohit Pandey | Apr, 2024

11 min read

10 hours ago

At the risk of stating the obvious, the biggest weakness of a data scientist is that they can’t practice their craft without high quality data. And creating a high quality dataset isn’t exactly trivial. This becomes the most obvious blocker to adding any kind of value via this discipline. Unlike engineering where you can roll up your sleeves and start building on day one, a data scientist can’t do much without first having the data.

In a big to medium sized organization, this problem is typically addressed by investing in data engineering first, getting the data flowing so that data scientists can then work on top of it and bring their skills to bear. An important feature of these data sets is that they are not static, but animate. As the business churns, data keeps flowing into the datasets, making them animate and evolving. The data science products built on top of them can then also evolve. This becomes a positive feedback loop, where once people see the value the data science products bring, it drives further investment in data engineering and collecting even richer data which in turn enables more powerful data science applications and so on.

While this story repeats many times over behind the closed doors of various organizations, I haven’t seen it unfold in the realm of…

An Open Source Database for Eclipse Chasers | by Rohit Pandey | Apr, 2024

Recent Articles

The Shadow Side of AutoML: When No-Code Tools Hurt More Than Help

High street hacks, and Disney’s Wingdings woe • Graham Cluley

Class Activation Maps (CAM). How Your Neural Net Sees Cats & Dogs! | by Prateek Karkare | May, 2025

The Rings of Power’s Cast Teases What’s in Store for Gandalf and Sauron in Season 3

NVIDIA Open-Sources Open Code Reasoning Models (32B, 14B, 7B)

Related Stories

Leave A Reply Cancel reply