Inference cost Olympics. Is time (waiting) an equally effective… | by Julian B | Aug, 2024


Photo by Braden Collum on Unsplash

Inference costs prize giving.

  1. Google take gold for Flash price slash of 80% 💥.
  2. GPT4o takes Silver, falling 50%..
  3. Bronze is undecided, but where is the puck heading 🤔.

{blended costs mil/tokens @ 3:1 in:out}

###Gemini 1.5 Flash $0.13125

###GPT-4o $4.38

Is it luck. or is time (waiting) an effective LLM $ optimisation strategy.

I mean; vs the time occupied QTR on QTR optimising the costs with resource/widgets.. if they just halve anyway..

The trends are always your friends, but what rate to book discount material costs like inference; halving costs multiple times in a 5 year solution lifecycle is inferred at least.

Kurzweil states Compute power steepens above 50% / per year from here. Although price is not always a function of cost or capacity in the short term and margins on new tech peak.

But.. Does anyone even predict a 5 year lifecycle or LLM TCO right now 🤯 1 year is an anecdotal survival horizon while Usecases find a level 🎯

Always the next new (model) to drive the missing quality and upgrade… or does provider differentiation kick in from now. And the efficient wave price plume emerge.. 🌊..

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here