Load testing Self-Hosted LLMs | Towards Data Science

By admin

October 20, 2024

Machine Learning

Do you need more GPUs or a modern GPU? How do you make infrastructure decisions?

A man pulling an elephant with his bare hands — Image created by the author using Dalle-E-2024

How does it feel when a group of users suddenly start using an app that only you and your dev team have used before?

That’s the million-dollar question of moving from prototype to production.

As far as LLMs are concerned, you can do a few dozen tweaks to run your app within the budget and acceptable qualities. For instance, you can choose a quantized model for lower memory usage. Or you can fine-tune a tiny model and beat the performance of giant LLMs.

You can even tweak your infrastructure to achieve better outcomes. For example, you may want to double the number of GPUs you use or choose the latest-generation GPU.

But how could you say Option A performs better than Option B and C?

This is an important question to ask ourselves at the earliest stages of going into production. All these options have their costs…

Load testing Self-Hosted LLMs | Towards Data Science

Do you need more GPUs or a modern GPU? How do you make infrastructure decisions?

Recent Articles

Shopify Summer ’25 Edition Introduces Horizon, a New Standard for Creative Control

Automating complex document processing: How Onity Group built an intelligent solution using Amazon Bedrock

Divorce by coffee grounds, and why AI robots need your brain • Graham Cluley

Lyma Laser Review: Clinical Results Without the Clinic

What the Most Detailed Peer-Reviewed Study on AI in the Classroom Taught Us

Related Stories

Leave A Reply Cancel reply