Inference cost Olympics. Is time (waiting) an equally effective… | by Julian B | Aug, 2024

Inference costs prize giving.

Google take gold for Flash price slash of 80% 💥.
GPT4o takes Silver, falling 50%..
Bronze is undecided, but where is the puck heading 🤔.

{blended costs mil/tokens @ 3:1 in:out}

###Gemini 1.5 Flash $0.13125

###GPT-4o $4.38

Is it luck. or is time (waiting) an effective LLM $ optimisation strategy.

I mean; vs the time occupied QTR on QTR optimising the costs with resource/widgets.. if they just halve anyway..

The trends are always your friends, but what rate to book discount material costs like inference; halving costs multiple times in a 5 year solution lifecycle is inferred at least.

Kurzweil states Compute power steepens above 50% / per year from here. Although price is not always a function of cost or capacity in the short term and margins on new tech peak.

But.. Does anyone even predict a 5 year lifecycle or LLM TCO right now 🤯 1 year is an anecdotal survival horizon while Usecases find a level 🎯

Always the next new (model) to drive the missing quality and upgrade… or does provider differentiation kick in from now. And the efficient wave price plume emerge.. 🌊..

Inference cost Olympics. Is time (waiting) an equally effective… | by Julian B | Aug, 2024

Recent Articles

Geleceği Şekillendiren Teknoloji: Sensör Verilerinden Anlamlı Bilgilere | by Tugba Niksarli | Jan, 2025

MiniMax-Text-01 and MiniMax-VL-01 Released: Scalable Models with Lightning Attention, 456B Parameters, 4M Token Contexts, and State-of-the-Art Accuracy

AI’s deliberate deceptions, and Elon’s “unhinged” mode • Graham Cluley

Creating a Generative Artwork with Three.js

Nest Protect support arrives in the Google Home app

Related Stories

Leave A Reply Cancel reply