Inference costs prize giving.
- Google take gold for Flash price slash of 80% 💥.
- GPT4o takes Silver, falling 50%..
- Bronze is undecided, but where is the puck heading 🤔.
{blended costs mil/tokens @ 3:1 in:out}
###Gemini 1.5 Flash $0.13125
###GPT-4o $4.38
Is it luck. or is time (waiting) an effective LLM $ optimisation strategy.
I mean; vs the time occupied QTR on QTR optimising the costs with resource/widgets.. if they just halve anyway..
The trends are always your friends, but what rate to book discount material costs like inference; halving costs multiple times in a 5 year solution lifecycle is inferred at least.
Kurzweil states Compute power steepens above 50% / per year from here. Although price is not always a function of cost or capacity in the short term and margins on new tech peak.
But.. Does anyone even predict a 5 year lifecycle or LLM TCO right now 🤯 1 year is an anecdotal survival horizon while Usecases find a level 🎯
Always the next new (model) to drive the missing quality and upgrade… or does provider differentiation kick in from now. And the efficient wave price plume emerge.. 🌊..