How to Write Efficient Dockerfiles for Your Python Applications

Image by Author | Ideogram

Docker has simplified how we deploy Python applications. But poorly optimized containers can lead to bloated images, slow builds, and security issues.

This article focuses on practical techniques that experienced Python and Docker developers can implement to streamline their containerization workflow.

Let’s cut through the basics and focus on techniques that will make a difference in your build times and image sizes.

1. Use Specific Base Images for Your Needs

Choose your base image carefully based on your specific requirements.

The standard python image includes many development tools you likely don’t need in production. The slim variant strikes a good balance between size and compatibility, while alpine is extremely small but may require additional work for packages with C extensions.

# For most applications
FROM python:3.11-slim

# For pure Python applications
FROM python:3.11-slim-bullseye

# For smallest possible image (but potential compatibility issues)
FROM python:3.11-alpine

Don’t just use the default image out of habit—evaluate which variant best suits your application’s needs. The choice of base image can have a greater impact on your final image size than almost any other optimization.

2. Use Non-root Users for Security

Avoid running containers as root user. If a container running as root is compromised, the attacker could potentially gain access to the host system.

By creating and using a non-privileged user, you can reduce this risk. This is a security best practice that should be standard in all production containers.

# Create a non-privileged user
RUN addgroup --system appgroup && \
    adduser --system --ingroup appgroup appuser && \
    chown -R appuser:appgroup /app

# Switch to that user
USER appuser

CMD ["python3", "app.py"]

If your application needs to bind to privileged ports, consider using a reverse proxy or adjusting the host port mapping instead of running as root.

3. Order Your Commands for Cache Efficiency

One of the most effective ways to speed up your Docker builds is to leverage the layer caching system. Docker caches each layer in your build, and will reuse these layers if they haven’t changed. By ordering your Dockerfile commands strategically, you can maximize this benefit.

Here’s an example Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Copy and install dependencies first
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code last (changes most frequently)
COPY . .

CMD ["python3", "app.py"]

This approach ensures that your dependencies are installed in a separate layer from your application code.

Since your code changes much more frequently than your dependencies, Docker will reuse the cached dependency layer on subsequent builds. This significantly reducing build times.

3. Minimize Image Size

Every megabyte matters in container images, especially if you’re deploying many instances or updating frequently.

Using the –no-cache-dir flag prevents pip from storing the downloaded packages. The cleanup commands remove temporary files and package lists.

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
    # Remove pip cache
    rm -rf /root/.cache/pip

# Remove unnecessary packages
RUN apt-get update && \
    apt-get purge -y --auto-remove curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

COPY . .

CMD ["python3", "app.py"]

Look for opportunities to remove any packages you don’t need at runtime. Remember, smaller images mean less storage costs and a reduced attack surface.

4. Implement Multi-stage Builds for Complex Dependencies

If your application requires compilation tools or build dependencies that aren’t needed at runtime, you can use multi-stage builds as shown.

# Build stage
FROM python:3.11 AS builder

WORKDIR /build
COPY requirements.txt .

# Install build dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc libpq-dev && \
    pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

# Final stage
FROM python:3.11-slim

WORKDIR /app
# Copy only wheels from builder
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir --no-index --find-links=/wheels /wheels/*

COPY . .

CMD ["python3", "app.py"]

This allows you to build complex packages with all their build dependencies in the first stage, then copy only the built wheels to your final image. The result? A lean runtime image.

5. Prune Unnecessary Python Dependencies

Dependencies can quickly bloat your image size. This approach uses pipdeptree to identify direct dependencies and then removes any packages not directly required by your application.

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
   …
    # Remove unnecessary dependencies
    pip install pipdeptree && \
    pipdeptree --warn silence | grep -v '^\w' | cut -d ' ' -f 2 > /tmp/req_packages && \
    pip freeze | grep -v -f /tmp/req_packages | xargs pip uninstall -y

Consider maintaining separate requirements files for development and production to avoid installing test frameworks and linters in your production image in the first place.

6. Use a .dockerignore File

Before your build even starts, you can optimize what gets sent to the Docker daemon by creating a thoughtful .dockerignore file:

# Version control
.git/
.gitignore

# Python artifacts
__pycache__/
*.py[cod]
*$py.class
*.so
.pytest_cache/
.coverage

# Development environments
.env
.venv
.
.
.

# Build artifacts
dist/
build/
*.egg-info/

# Local development files
data/
logs/
*.log

This file works like .gitignore but for Docker builds. Excluding these files speeds up the build process (by sending less data to the Docker daemon). It also prevents potential leakage of sensitive information or local development artifacts into your image.

7. Leverage BuildKit’s Advanced Features

Docker BuildKit introduces powerful features worth exploring. The cache mount feature creates a persistent cache across builds, speeding up package installation.

# Mount your local cache to speed up pip
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

The secret mount allows you to use sensitive data during the build without it becoming part of the image layers.

# Mount secrets without baking them into the image
RUN --mount=type=secret,id=db_password,dst=/run/secrets/db_password \
    python -c 'import os; open("config.py", "w").write(f"PASSWORD = \"open("/run/secrets/db_password").read().strip()\"")'

You can enable BuildKit by setting the DOCKER_BUILDKIT=1 environment variable or in your Docker daemon configuration.

Wrapping Up

By implementing these specific techniques, you’ll not only reduce your image sizes and build times but also create more maintainable and secure containerized Python applications.

Remember that containerization is an iterative process. Regularly revisit your Dockerfiles as your application code changes, and don’t be afraid to test different approaches to find what works best for your specific use case.

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

How to Write Efficient Dockerfiles for Your Python Applications

1. Use Specific Base Images for Your Needs

2. Use Non-root Users for Security

3. Order Your Commands for Cache Efficiency

3. Minimize Image Size

4. Implement Multi-stage Builds for Complex Dependencies

5. Prune Unnecessary Python Dependencies

6. Use a .dockerignore File

7. Leverage BuildKit’s Advanced Features

Wrapping Up

Recent Articles

7 “Useless” Python Standard Library Functions You Should Know

LLMs Can Now Solve Challenging Math Problems with Minimal Data: Researchers from UC Berkeley and Ai2 Unveil a Fine-Tuning Recipe That Unlocks Mathematical Reasoning...

ASUS Confirms Critical Flaw in AiCloud Routers; Users Urged to Update Firmware

NASA’s Lucy Spacecraft Set for High-Stakes Asteroid Encounter on Sunday

Applications with Context Vectors – MachineLearningMastery.com

Related Stories

Leave A Reply Cancel reply