How to Write Efficient Dockerfiles for Your Python Applications



Image by Author | Ideogram

 

Docker has simplified how we deploy Python applications. But poorly optimized containers can lead to bloated images, slow builds, and security issues.

This article focuses on practical techniques that experienced Python and Docker developers can implement to streamline their containerization workflow.

Let’s cut through the basics and focus on techniques that will make a difference in your build times and image sizes.

 

1. Use Specific Base Images for Your Needs

 
Choose your base image carefully based on your specific requirements.

The standard python image includes many development tools you likely don’t need in production. The slim variant strikes a good balance between size and compatibility, while alpine is extremely small but may require additional work for packages with C extensions.

# For most applications
FROM python:3.11-slim

# For pure Python applications
FROM python:3.11-slim-bullseye

# For smallest possible image (but potential compatibility issues)
FROM python:3.11-alpine

 

Don’t just use the default image out of habit—evaluate which variant best suits your application’s needs. The choice of base image can have a greater impact on your final image size than almost any other optimization.

 

2. Use Non-root Users for Security

 
Avoid running containers as root user. If a container running as root is compromised, the attacker could potentially gain access to the host system.

By creating and using a non-privileged user, you can reduce this risk. This is a security best practice that should be standard in all production containers.

# Create a non-privileged user
RUN addgroup --system appgroup && \
    adduser --system --ingroup appgroup appuser && \
    chown -R appuser:appgroup /app

# Switch to that user
USER appuser

CMD ["python3", "app.py"]

 

If your application needs to bind to privileged ports, consider using a reverse proxy or adjusting the host port mapping instead of running as root.

 

3. Order Your Commands for Cache Efficiency

 
One of the most effective ways to speed up your Docker builds is to leverage the layer caching system. Docker caches each layer in your build, and will reuse these layers if they haven’t changed. By ordering your Dockerfile commands strategically, you can maximize this benefit.

Here’s an example Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Copy and install dependencies first
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code last (changes most frequently)
COPY . .

CMD ["python3", "app.py"]

 

This approach ensures that your dependencies are installed in a separate layer from your application code.

Since your code changes much more frequently than your dependencies, Docker will reuse the cached dependency layer on subsequent builds. This significantly reducing build times.

 

3. Minimize Image Size

 
Every megabyte matters in container images, especially if you’re deploying many instances or updating frequently.

Using the –no-cache-dir flag prevents pip from storing the downloaded packages. The cleanup commands remove temporary files and package lists.

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
    # Remove pip cache
    rm -rf /root/.cache/pip

# Remove unnecessary packages
RUN apt-get update && \
    apt-get purge -y --auto-remove curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

COPY . .

CMD ["python3", "app.py"]

 

Look for opportunities to remove any packages you don’t need at runtime. Remember, smaller images mean less storage costs and a reduced attack surface.

 

4. Implement Multi-stage Builds for Complex Dependencies

 
If your application requires compilation tools or build dependencies that aren’t needed at runtime, you can use multi-stage builds as shown.

# Build stage
FROM python:3.11 AS builder

WORKDIR /build
COPY requirements.txt .

# Install build dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc libpq-dev && \
    pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

# Final stage
FROM python:3.11-slim

WORKDIR /app
# Copy only wheels from builder
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir --no-index --find-links=/wheels /wheels/*

COPY . .

CMD ["python3", "app.py"]

 

This allows you to build complex packages with all their build dependencies in the first stage, then copy only the built wheels to your final image. The result? A lean runtime image.

 

5. Prune Unnecessary Python Dependencies

 
Dependencies can quickly bloat your image size. This approach uses pipdeptree to identify direct dependencies and then removes any packages not directly required by your application.

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
   …
    # Remove unnecessary dependencies
    pip install pipdeptree && \
    pipdeptree --warn silence | grep -v '^\w' | cut -d ' ' -f 2 > /tmp/req_packages && \
    pip freeze | grep -v -f /tmp/req_packages | xargs pip uninstall -y

 

Consider maintaining separate requirements files for development and production to avoid installing test frameworks and linters in your production image in the first place.

 

6. Use a .dockerignore File

 
Before your build even starts, you can optimize what gets sent to the Docker daemon by creating a thoughtful .dockerignore file:

# Version control
.git/
.gitignore

# Python artifacts
__pycache__/
*.py[cod]
*$py.class
*.so
.pytest_cache/
.coverage

# Development environments
.env
.venv
.
.
.

# Build artifacts
dist/
build/
*.egg-info/

# Local development files
data/
logs/
*.log

 

This file works like .gitignore but for Docker builds. Excluding these files speeds up the build process (by sending less data to the Docker daemon). It also prevents potential leakage of sensitive information or local development artifacts into your image.

 

7. Leverage BuildKit’s Advanced Features

 
Docker BuildKit introduces powerful features worth exploring. The cache mount feature creates a persistent cache across builds, speeding up package installation.

# Mount your local cache to speed up pip
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

 

The secret mount allows you to use sensitive data during the build without it becoming part of the image layers.

# Mount secrets without baking them into the image
RUN --mount=type=secret,id=db_password,dst=/run/secrets/db_password \
    python -c 'import os; open("config.py", "w").write(f"PASSWORD = \"open("/run/secrets/db_password").read().strip()\"")'

 

You can enable BuildKit by setting the DOCKER_BUILDKIT=1 environment variable or in your Docker daemon configuration.

 

Wrapping Up

 
By implementing these specific techniques, you’ll not only reduce your image sizes and build times but also create more maintainable and secure containerized Python applications.

Remember that containerization is an iterative process. Regularly revisit your Dockerfiles as your application code changes, and don’t be afraid to test different approaches to find what works best for your specific use case.

 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.



Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here