Data preparation isn’t just a part of the ML engineering process — it’s the heart of it.
To set the stage, let’s examine the nuances between research-phase data and production-phase data.
The contrast highlights the “production data” we’ll call “data” in this post. Data is a key differentiator in ML projects (more on this in my blog post below).
Here, I’ll focus on preparing it to achieve the quality required for success. This post dives into key steps for preparing data to build real-world ML systems. Each phase is loaded with practical tips to keep your process streamlined and effective.
Data ingestion ensures that all relevant data is aggregated, documented, and traceable. It involves the following core…