BentoML: MLOps for Beginners – KDnuggets

Image by Author

As a data scientist, have you ever found yourself bogged down by DevOps tasks like creating Docker containers, learning Kubernetes, or managing cloud deployments? These challenges can feel overwhelming, especially for beginners in MLOps. That’s where BentoML comes in.

BentoML is a powerful yet beginner-friendly tool that simplifies MLOps workflows. It allows you to build model endpoints, create Docker images, and deploy models to the cloud—all with just a few CLI commands. No need to dive deep into complex DevOps processes; BentoML handles it for you, making it an ideal choice for those new to MLOps.

In this tutorial, we will explore BentoML by building a Text-to-Speech application, deploying it to BentoCloud, testing model inference, and monitoring its performance.

What is BentoML?

BentoML is an open-source framework designed for model serving and deployment. It automates key tasks such as creating Docker images, setting up infrastructure and environment, scaling your applications on demand, and adding secure endpoints so the people who access them require API keys. This allows data scientists to quickly build production-ready AI systems with limited knowledge about what is going on behind the scenes.

BentoML is not just a tool. It is an ecosystem that comes with BentoCloud, OpenLLM, OIC Image Builder, VLLM, and many more integrations.

Setup up the TTS Project

We will set up the project first by installing the BentoML Python package using the PIP command.

After that, we will create the `app.py` file, which will contain all the code for model serving. We are building a text-to-speech (TTS) service for deployment using the Bark model via BentoML.

Setting up the BentoML service with 1 GPU ( NVIDIA Tesla T4) for processing and setting a timeout of 300 seconds for API requests.
The BentoBark class initiates the model and tokenizer by loading it from the Hugging Face hub.
It processes user text using AutoProcessor and generates audio with BarkModel using the default voice preset.
It saves the generated audio as `output.wav` and returns its file path.

app.py:

from __future__ import annotations

import os
import typing as t
from pathlib import Path

import bentoml

SAMPLE_TEXT = "♪ Jingle bells, jingle bells, jingle all the way ♪"


@bentoml.service(
    resources={
        "gpu": 1,
        "gpu_type": "nvidia-tesla-t4",
    },
    traffic={"timeout": 300},
)
class BentoBark:
    def __init__(self) -> None:
        import torch
        from transformers import AutoProcessor, BarkModel

        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.processor = AutoProcessor.from_pretrained("suno/bark")
        self.model = BarkModel.from_pretrained("suno/bark").to(self.device)

    @bentoml.api
    def generate(
        self,
        context: bentoml.Context,
        text: str = SAMPLE_TEXT,
        voice_preset: t.Optional[str] = None,
    ) -> t.Annotated[Path, bentoml.validators.ContentType("audio/*")]:
        import scipy

        voice_preset = voice_preset or None

        output_path = os.path.join(context.temp_dir, "output.wav")
        inputs = self.processor(text, voice_preset=voice_preset).to(self.device)
        audio_array = self.model.generate(**inputs)
        audio_array = audio_array.cpu().numpy().squeeze()

        sample_rate = self.model.generation_config.sample_rate
        scipy.io.wavfile.write(output_path, rate=sample_rate, data=audio_array)

        return Path(output_path)

We will now create a `bentofile.yaml file that includes all the commands for creating the infrastructure and environment.

Service: name of the files and class name of the service (app:BentoBark)
Labels: Owner and project name.
Include: Only Python files.
Python: install all the necessary Python packages using the `requirements.txt` file.
Docker: set up the docker file with the Python version and system packages.

bentofile.yaml:

service: "app:BentoBark"
labels:
  owner: Abid
  project: Bark-TTS
include:
  - "*.py"
python:
  requirements_txt: requirements.txt
docker:
  python_version: "3.11"
  system_packages:
    - ffmpeg
    - git

The requirements.txt file lists all the Python packages needed to create the environment for the cloud.

requirements.txt:

bentoml
nltk
scipy
suno-bark @ git+https://github.com/suno-ai/bark.git
torch
transformers
numpy

Deploying the TTS Service

To deploy this application in the cloud, we will log in to BentoCloud using the CLI command. It will redirect you to create the account and API key.

Then, type the following command in the terminal to deploy your text-to-speech application.

It will push the Docker image and then containerize the application. After that, it will download the model and initiate the AI service.

You can go directly to your BentoCloud dashboard to see the deployment status.

You can also use the Events tab to check the deployment status. Our service is successfully running.

Testing the TTS Service

We will test our service using the Playground provided by BentoCloud. Just type the text and click on the Submit button. It will generate the WAV file containing the audio within a few seconds.

You can also access the API endpoint from your terminal using the CURL command.

curl -s -X POST \
    'https://bento-bark-bpaq-39800880.mt-guc1.bentoml.ai/generate' \
    -H 'Content-Type: application/json' \
    -d '{
        "text": "For vnto euery one that hath shall be giuen, and he shall haue abundance: but from him that hath not, shal be takē away, euen that which he hath.",
        "voice_preset": ""
    }' \
    -o output.mp3

We successfully created the mp3 file using the text provided, and it sounds perfect.

Monitoring the TTS Service

The best part of BentoCloud is that you don’t have to set up monitoring services like Prometheus and Grafana. Simply go to the Monitoring tab and scroll down to view all kinds of metrics related to the model, machine, and model performance.

Final Thoughts

I am absolutely in love with the BentoML ecosystem. It provides a simple and efficient solution to most of my challenges. What makes it even more impressive is that I don’t need to learn complex concepts like cloud computing or Kubernetes to deploy a fully functional AI application. All it takes is writing a few lines of code and running a single CLI command to deploy the AI service seamlessly.

If you are having trouble running or deploying the TTS service, here is the GitHub repository kingabzpro/TTS-BentoML to help you. All you have to do is clone the repository and run the command.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

BentoML: MLOps for Beginners – KDnuggets

What is BentoML?

Setup up the TTS Project

Deploying the TTS Service

Testing the TTS Service

Monitoring the TTS Service

Final Thoughts

Recent Articles

FTC sues Uber over ‘deceptive’ Uber One subscriptions

Amazon Bedrock Prompt Optimization Drives LLM Applications Innovation for Yuewen Group

FamousSparrow resurfaces to spy on targets in the US, Latin America

Escaping Flatland: Rediscovering Reality Through Momentum | by Eóin Walsh | Apr, 2025

How to Use Gyroscope in Presentations, or Why Take a JoyCon to DPG2025

Related Stories

Leave A Reply Cancel reply