Why GPT-4o Mini Outperforms Claude 3.5 Sonnet on LMSys?


The LMSys Chatbot Arena has recently released scores for GPT-4o Mini, sparking a topic of discussion among AI researchers. GPT-4o Mini outperformed Claude 3.5 Sonnet, which is frequently praised as the most intelligent Large Language Model (LLM) on the market, according to the results. This rating prompted a more thorough study of the elements underlying GPT-4o Mini’s exceptional performance.

To quell the curiosity about the rankings, LMSys offered a random selection of one thousand actual user prompts. These questions contrasted the answers of GPT-4o Mini with those of Claude 3.5 Sonnet and other LLMs. In a recent Reddit post, significant insights into why GPT-4o Mini frequently outperformed Claude 3.5 Sonnet have been shared.

The GPT-4o Mini’s critical success factors are as follows:

  1. Refusal Rate: The reduced rejection rate of GPT-4o Mini is one of the key areas in which it shines. In contrast to Claude 3.5 Sonnet, which occasionally chooses not to respond to specific commands, GPT-4o Mini usually does so more regularly. This quality fits in nicely with the requirements of users who would rather work with a more cooperative LLM and are eager to try to answer every question, no matter how difficult or peculiar.
  1. Length of Response: GPT-4o Mini frequently offers more thorough and extended responses than Claude 3.5 Sonnet. Claude 3.5 strives for succinct responses, whereas GPT-4o Mini tends to be unduly detailed. This thoroughness might be especially enticing when people are looking for in-depth details or explanations of certain topics.
  1. Formatting and presenting: GPT-4o Mini performs noticeably better than Claude 3.5 Sonnet in the formatting and presenting of replies. GPT-4o Mini uses headers, different font sizes, bolding, and efficient whitespace management to improve the readability and aesthetic appeal of its replies. Claude 3.5 Sonnet, on the other hand, styles its outputs minimally. GPT-4o Mini’s comments may be more interesting and simpler to understand as a result of this presentational variation.

Some users have a prevalent idea that suggests an ordinary human assessor does not possess the necessary discernment to assess the correctness of LLM responses. This idea, however, does not apply to LMSys. The majority of users ask questions that they are able to evaluate fairly, and the GPT-4o Mini winning answers were typically superior in at least one important prompt-related area.

LMSys prompts a wide range of topics, from challenging assignments like arithmetic, coding, and reasoning challenges to more standard questions like amusement or everyday task support. Both Claude 3.5 Sonnet and GPT-4o Mini can provide accurate responses despite their differing levels of sophistication. GPT-4o Mini has an advantage in simpler cases because of its superior formatting and refusal to refuse an answer.

In conclusion, GPT-4o Mini outperforms Claude 3.5 Sonnet on LMSys because of its superior formatting, lengthier and more thorough responses, and decreased refusal rate. These features meet the needs of the typical LMSys user, who prioritizes readability, thorough responses, and more collaboration from the LLM. Maintaining the top spots on platforms like LMSys will become harder as the accessibility landscape for LLM changes, necessitating constant updates and modifications from the models.


Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here