Data Engineering — ORM and ODM with Python | by Marcello Politi | Jan, 2025

Manipulate database data leveraging an object-oriented programming paradigm

When working on data science projects, one fundamental pipeline to set up is the one regarding data collection. Real-world Machine Learning mainly differs from Kaggle-like problems because data is not static. We need to scrape websites, gather data from APIs, and so on. This way of collecting data might look chaotic, and it is! That’s why we need to structure our code following best practices to bring some sort of order to all this mess.

Once you identified the sources from which you want to gather your data, you need to collect them in a structured way to store those in your database. For example, you might decide that in order to train your LLM what you need are data sources which contain 3 fields: author, content, and link.

What you could do is to download the data, and then write SQL queries to store and retrieve data from your database. More commonly you might want to implement all the queries to perform CRUD operations. CRUD stands for create, read, update, and delete. These are the four basic functions of persistent storage.

Data Engineering — ORM and ODM with Python | by Marcello Politi | Jan, 2025

Manipulate database data leveraging an object-oriented programming paradigm

Recent Articles

FutureHouse Researchers Propose Aviary: An Extensible Open-Source Gymnasium for Language Agents

US government sanctions Chinese cybersecurity company linked to APT group

Mastering the Basics: How Linear Regression Unlocks the Secrets of Complex Models | by Miguel Cardona Polo | Jan, 2025

Elon Musk Calls Out NASA’s Moon Ambitions: ‘We’re Going Straight to Mars’

Optimizing Machine Learning Models for Production: A Step-by-Step Guide

Related Stories

Leave A Reply Cancel reply