Landing a Data Engineer Role: Free Courses and Certifications


Image by Author

People say you should consider value for money when buying things. However, the best value for money is getting something good for free. But do such things exist? Supposedly not, if we go by the saying, “No such thing as a free lunch.”

I claim there is a free lunch, and I’m about to prove it! I dug out 10 educational ‘free lunches’ – free data engineering courses that also provide quality knowledge.  It is true; there’s much more variety and choice if you can or want to pay tens, hundreds, sometimes even thousands of dollars.

Many such courses are considered free on some other free course lists. Paying $90 one-off or $45/month is free to some people. But many people don’t have that money for a ‘free’ course, despite being very willing to learn data engineering. (Also, let’s get real! Free literally means, well, free! Not ‘cheap’, not ‘very little money’, or ‘affordable’. Free!)

From what I researched, these courses really are free. Many are from edX. If you choose free access to the course, you must complete it in a certain time, usually around six months. But that should be enough to complete every course comfortably. Also, free access means you don’t get lifetime access to all the materials (they are deleted once you finish) and don’t get a certificate. Despite this, you should be able to use these courses to learn about data engineering.

Before I talk about the courses, let’s briefly overview the data engineer’s role. That way, knowing what to look for in courses will be easier.

 

Understanding the Role of a Data Engineer

 

Very simply, data engineers are in charge of making data available to data team members and other stakeholders. In doing so, they wrangle data and build and maintain data infrastructure, e.g., ETL process, data pipelines, data storage.

Understanding the Role of a Data EngineerUnderstanding the Role of a Data Engineer

Naturally, the courses should cover all or some of those skills. Let’s take a closer look at the courses – pun intended – that will comprise your educational free lunch.

 

Free Data Engineering Courses

 

1. Data Engineering by ASU

Platform and link to the course: edX

Duration: 5 weeks at 1-9 hours/week; learn at your own pace

Description: This introductory-level course by Arizona State University focuses on working with databases in data engineering and how to interact with them using SQL. You will learn about database structure, the star schema, and joining data from multiple tables. In the final stage, you will learn how to create reports with SQL and write scripts for data processing.

 

2. Python and Pandas for Data Engineering by Pragmatic AI Labs

Platform and link to the course: edX

Duration: 4 weeks at 3-6 hours/week; learn at your own pace

Description: In yet another introductory edX course, you’ll learn Python and pandas for data engineering. The introduction to Python consists of topics such as simple statements, if statements, while loops, and functions. Then, you’ll learn about data manipulation in Pandas (particularly DataFrames) and its alternatives, such as NumPy, Spark, and PySpark. In the last module, you’ll learn about Python development environments and version control.

 

3. Scripting with Python and SQL for Data Engineering by Pragmatic AI Labs

Platform and link to the course: edX

Duration: 4 weeks at 3-6 hours/week; learn at your own pace

Description: If you want to learn SQL and Python for data engineering simultaneously, this is the course for you. You’ll use Python’s built-in data structures to manipulate data and write Python scripts for data task automation. The course also teaches you web scraping and using SQLite to store and query data in Python. Regarding SQL, you’ll learn how to import and export data from MySQL database and how to execute MySQL queries in VSCode.

 

4. Cloud Data Engineering by Pragmatic AI Labs

Platform and link to the course: edX

Duration: 4 weeks at 3-6 hours/week; learn at your own pace

Description: This course will teach you data engineering in the cloud. You’ll learn about methodologies in data engineering, develop distributed systems, serverless data engineering systems, and cloud ETL pipelines, and learn about data governance. In the process, you’ll get in touch with technologies such as:

  • CUDA
  • Numba
  • ASICs
  • Colab Pro
  • Colab API
  • Google BigQuery
  • AWS
  • Databricks SQL
  • Click
  • Python
  • Rust

This is also an introductory course with no prerequisites needed.

 

5. Building ETL and Data Pipelines with Bash, Airflow and Kafka by IBM

Platform and link to the course: edX

Duration: 5 weeks at 2-4 hours/week; learn at your own pace

Description: This data engineering course focuses on building ETL and data pipelines. During the course, you’ll learn what ETL and ELT processes are, create ETL using Bash shell scripts, use Apache Airflow to create batch data pipelines, and Apache Kafka for streaming data pipelines.

This is an introductory course to these topics but requires experience working with relational databases, SQL, and Bash shell scripting.

 

6. Data Warehousing and BI Analytics by IBM

Platform and link to the course: edX

Duration: 6 weeks at 2-3 hours/week; learn at your own pace

Description: This intermediate course by IBM teaches you the essentials of data warehouses, data marts, and data lakes. You will learn how to design, model, and implement data warehouses. More specifically, you will use CUBEs, ROLLUPs, materialized views, and tables. You’ll also learn about facts and dimensional modeling, data modeling with star and snowflake schemas, staging areas for data warehouses, data quality, and populating a data warehouse with data. In the third module, you’ll work on data warehouse analytics in Cognos Analytics.

The course requires experience with SQL and relational databases.

 

7. Apache Spark for Data Engineering and Machine Learning by IBM

Platform and link to the course: edX

Duration: 3 weeks at 2-3 hours/week; learn at your own pace

Description: Yet another intermediate course. It focuses on teaching Apache Spark. It’s an important tool in data engineering, so you’ll learn about Spark Structured Streaming, GraphFrames, ETL process, and ML pipelines. In addition, you’ll learn ML fundamentals, such as regression, classification, and clustering.

The course requires foundational Apache Spark knowledge. It’s also suggested that you complete the Big Data, Hadoop and Spark Basics course by IBM.

 

8. DE Zoomcamp

Platform and link to the course: DataTalks.Club

Duration: 10 weeks; learn at your own pace

Description: Finally, a course from a different platform! This online boot camp will provide you with comprehensive data engineering knowledge. It’ll teach you containerization and infrastructure, workflow orchestration, data warehousing, analytics engineering, batch processing, and streaming. You’ll be introduced to technologies such as Google Cloud Platform, Terraform, Docker, SQL, Mage, dbt, Apache Spark, and Apache Kafka.

The prerequisites for this bootcamp are the SQL basics. Also, it’s preferable that you have experience with Python or, if not, some other programming language.

 

9. DE End-to-End Projects

Platform and link to the course: DE Academy

Duration: No info.

Description: This is a project-based project in which you’ll learn how to use AWS, Snowflake, Python,Kafka, Azure, Databricks, Airflow, and Tableau. You will analyze and transform data, migrate it, and streamline workflows.

 

10. Scala Programming for Data Science

Platform and link to the course: Cognitive Class AI

Duration: 20 hours; learn at your own pace

Description: This learning path consists of three courses. The first is Scala 101, which will teach you the basics of object-oriented programming, case objects & classes, collections, and idiomatic Scala. In the second course, Spark Overview for Scala Analytics, you will be introduced to Apache Spark, RDDs, DataFrames for large-scale data science, and advanced Spark topics (e.g., Hive with Spark, Spark streaming). The third course is about Scala in data science, where you will learn basic statistics and data types, how to prepare data, engineer features, fit a model, build a pipeline, and perform grid search.

 

Conclusion

 

No surprise that it’s easier when you have money – you get access to more courses that are more diverse. Yeah, it sucks not having money! But this doesn’t mean you must say goodbye to your dream of landing a data engineer role.

It is much harder to find them, but there are still some good courses that can teach you basic and more advanced data engineering. I found ten of them. Some other free resources, such as blogs or YouTube videos, can help you reach the required level of knowledge.

If you’re industrious enough, dedicated, and persistent, I’m sure you can land a data engineering role for free.

 

Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.



Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here