Data Engineering — ORM and ODM with Python | by Marcello Politi | Jan, 2025


Photo by David Clode on Unsplash

Manipulate database data leveraging an object-oriented programming paradigm

When working on data science projects, one fundamental pipeline to set up is the one regarding data collection. Real-world Machine Learning mainly differs from Kaggle-like problems because data is not static. We need to scrape websites, gather data from APIs, and so on. This way of collecting data might look chaotic, and it is! That’s why we need to structure our code following best practices to bring some sort of order to all this mess.

Once you identified the sources from which you want to gather your data, you need to collect them in a structured way to store those in your database. For example, you might decide that in order to train your LLM what you need are data sources which contain 3 fields: author, content, and link.

What you could do is to download the data, and then write SQL queries to store and retrieve data from your database. More commonly you might want to implement all the queries to perform CRUD operations. CRUD stands for create, read, update, and delete. These are the four basic functions of persistent storage.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here