Image by Author
Airflow was created to resolve the complexity of managing multiple pipelines and workflows. Before the invention of Airflow, many organizations depended on cron jobs, custom scripts, and other inefficient means when faced with big data generated by millions of users frequently. These solutions became hard to maintain, inflexible, and lacked visibility due to the inability to visualize the status of running workflows, monitor failure points, and debug errors.
Apache Airflow, as it is popularly known today, was started by Maxime Beauchemin at Airbnb in October 2014 as Airflow. From the onset, it has been open-source, and in June 2015, it was officially announced to be under Airbnb GitHub. In March 2016, the project became part of the Apache Software Foundation incubation program and thereafter became known as Apache Airflow.
Here is the list of the project contributors.
Most data professionals (data engineers, machine learning engineers) and top companies, such as Airbnb and Netflix, use Apache Airflow daily. That is why you will learn how to install and use Apache Airflow in this article.
Prerequisites
A good working knowledge of the Python programming language is needed to fully utilize this article, as code snippets and the Airflow framework are written in Python. This article will familiarize you with the Apache Airflow platform and teach you how to install it and carry out simple tasks
What is Apache Airflow
The Apache Airflow official documentation defines Apache Airflow as “an open-source platform for developing, scheduling, and monitoring batch-oriented workflows”.
The platform’s Python framework allows users to build workflows that connect with virtually all technologies. Airflow is deployable and can be deployed as a single unit on your laptop or on a distributed system to support workflows as large as you can imagine.
At the core of Airflow design is its “programmatic nature”; it ensures that workflows are represented as Python code.
Key Components in Apache Airflow
1. DAG
DAG (or Directed Acyclic Graph) is the collection of the several tasks you intend to run, arranged in a way that shows their relationships and dependencies. It represents a workflow graph structure where the task to be executed is a node, and the edges are the dependencies between tasks.
“Directed” ensures that tasks are executed in a certain order, and “Acyclic” prevents cellular dependencies, preventing tasks from repeating all over again. DAGs are written as Python scripts and placed in Airflow’s DAG_FOLDER
.
2. Tasks
These are the individual activities or units of work performed in DAG. Examples include running an SQL query, reading from a database, and so on.
3. Operators
Operators are used as building blocks to create specific tasks within a DAG. Every operator states the type of task to be executed; this could be sending an email, executing a bash script, and so on. While a DAG is useful for ordering tasks, an operator is useful for initiating the specific task to be executed. Some of the common operators used in Airflow are BashOperator (for executing bash commands), EmailOperator (for sending emails), and PythonOperator (for calling an arbitrary Python function).
4. Scheduling
Scheduling in Airflow is achieved with a scheduler. It monitors all available tasks and DAGs and triggers the task instances when the dependencies (prior tasks to be completed) are met. So, the scheduler stays working behind the scenes by inspecting active tasks to determine whether they can be triggered.
5. XComs
XComs is an abbreviation for “cross-communication.” It enables communication between tasks. It contains the key, value, and timestamp, and, most likely, the task/DAG that created the XCom.
6. Hooks
A hook can be thought of as an abstraction layer or interface to external platforms or resource locations. It enables tasks to connect to these platforms easily without having to go through the rigors of authentication and what would have been a complicated communication process.
7. Web UI
The Web UI gives a pleasing interface for visually monitoring and troubleshooting data pipelines. See the image below:

Photo from Apache Airflow Documentation
A Guide on How to Run Apache Airflow on Your Machine
Setting up Apache Airflow on your machine typically entails establishing the Airflow environment, initializing the database, and starting the Airflow webserver and environment.
Follow the steps below:
Step1: Set up a Python virtual environment for the project
python3 -m venv airflow_tutorial
Step 2: Activate the created virtual environment
On Mac/Linux
source airflow_tutorial/bin/activate
On Windows
airflow_tutorial\Scripts\activate
Step 3: Install Apache Airflow
Run the following code in your terminal inside your activated virtual environment.
pip install apache-airflow
Step 4: Set up the Airflow directory and configure the database
Initialize the Airflow database
This generates the necessary tables and configurations in the ~/airflow
directory by default.
Step 5: Create Airflow user
Creating an admin user enables you to access the Airflow web interface. On your terminal run:
airflow users create \
--username admin \
--firstname FirstName \
--lastname LastName \
--role Admin \
--email admin@example.com
After running this bash script on your terminal, you will be prompted to enter your admin password of choice.
Step 6: Start the Airflow webserver
Starting the webserver grants you access to the Airflow UI. Run this code on your terminal:
airflow webserver --port 8080
Open the URL showing in your console and log in with the credentials you created in step 5.
Step 7: Start the Airflow Scheduler
The scheduler handles task execution. Open a new terminal window and activate the same virtual environment as we did in step 2. Then start the scheduler by running this bash script on your terminal:
Step 8: Create and run a DAG of choice
Remember, from step 3, we created our airflow directory, which typically would live in our root folder. Create a dags folder inside the airflow directory and place your DAG files there. Example ~/airflow/dags/dags_tutorial.py
In your dags_tutorial.py
file, write the following code:
from datetime import datetime
from airflow import DAG
from airflow.decorators import task
from airflow.operators.bash import BashOperator
# A DAG represents a workflow, a collection of tasks
with DAG(dag_id="demo", start_date=datetime(2025, 1, 5), schedule="0 0 * * *") as dag:
# Tasks are represented as operators
hello = BashOperator(task_id="hello", bash_command="echo hello")
@task()
def airflow():
print("airflow")
# Set dependencies between tasks
hello >> airflow()
Shortly after running this code, the available DAGs will automatically appear on the web UI, as shown below.

Image by Author
Conclusion
Apache Airflow is an amazing open-source platform that efficiently simplifies the handling of multiple workflows and pipelines. It provides a programmatic feel and a UI for monitoring and troubleshooting tasks.
In this article, we have learned about this awesome technology and used it to create a simple DAG. I recommend incorporating Airflow into your routine to quickly become familiar with the technology. Thanks for reading.
Shittu Olumide is a software engineer and technical writer passionate about leveraging cutting-edge technologies to craft compelling narratives, with a keen eye for detail and a knack for simplifying complex concepts. You can also find Shittu on Twitter.