Top 5 LLMs to Use According to FACTS Leaderboard


Top 5 LLMs to Use According to FACTS Leaderboard
Image by Author

 

FACTS Grounding is a cutting-edge benchmark introduced by Google DeepMind and Google Research to assess the factual accuracy and grounding of large language models (LLMs). In this blog, we will explore some of the most accurate and factually reliable LLMs that are reshaping the AI landscape, addressing one of the biggest challenges in AI: ensuring factual consistency and reducing hallucinations.

 

What is FACTS Leaderboard?

 

The FACTS Leaderboard is a public platform that ranks Large Language Models (LLMs) based on their performance in the FACTS Grounding benchmark, which evaluates the factual accuracy and contextual grounding of long-form responses.

By using an ensemble of advanced LLM judges, the leaderboard calculates a factuality score by assessing whether responses are fully supported by the provided context, while also filtering out low-quality or evasive answers. It averages results from both public and private datasets to ensure fairness and reliability.

 

1. Gemini 2.0 Flash

 

  • Factuality Score: 83.6% (±1.8%)
  • Organization: Google
  • License: Proprietary
  • Knowledge Cutoff: August 2024

Gemini 2.0 Flash takes the top spot on the leaderboard with the highest factuality score. This signifies its exceptional ability to deliver accurate and reliable information. Released by Google, this model showcases significant improvements over its predecessor in terms of factual reasoning and contextual understanding.

 

2. Gemini 1.5 Flash

 

  • Factuality Score: 82.9% (±1.8%)
  • Organization: Google
  • License: Proprietary
  • Knowledge Cutoff: November 2023

A slightly older version of Gemini 2.0, the Gemini 1.5 Flash still holds its ground with an impressive factuality score. It is particularly well-suited for applications where computational efficiency and factuality need to be balanced. Despite being surpassed by Gemini 2.0, it remains one of the most reliable models on the market.

 

3. Claude 3.5 Sonnet

 

  • Factuality Score: 79.4% (±1.9%)
  • Organization: Anthropic
  • License: Proprietary
  • Knowledge Cutoff: April 2024

Anthropic’s Claude 3.5 Sonnet ranks third with its emphasis on ethical AI and robust factuality. While it trails behind Google’s Gemini models, its performance is still notable, particularly in areas requiring nuanced reasoning and natural conversational capabilities.

 

4. GPT-4o

 

  • Factuality Score: 78.8% (±1.9%)
  • Organization: OpenAI
  • License: Proprietary
  • Knowledge Cutoff: October 2023

OpenAI’s GPT-4o is an improved version of GPT-4, providing a balance of factual accuracy and computational efficiency. Although it ranks fourth, it remains my preferred model for coding, writing, and general inquiry questions. To enhance its factual accuracy, all you need to do is provide clear and comprehensive context.

 

5. Claude 3.5 Haiku

 

  • Factuality Score: 74.2% (±2.1%)
  • Organization: Anthropic
  • License: Proprietary
  • Knowledge Cutoff: April 2024

Rounding out the top 5 is Claude 3.5 Haiku, another model from Anthropic. While it has the lowest factuality score among the top contenders, it still performs well in generating accurate and fast responses. Its unique strength lies in its ability to process short-form, creative, and poetic queries, which makes it a great option for more niche tasks.

 

Final Thoughts

 

The FACTS leaderboard highlights the Gemini models as the leading LLMs. This can be due to biases because these benchmarks are created by Google teams, and it’s obvious they want their models to top the ranking for marketing and promotion. But if you want my opinion, I think the new generation of Gemini models are great in all kinds of benchmarks. So, choosing the right model depends on the user’s specific needs, such as factual accuracy, computational efficiency, speed, or creative flexibility.

 
 

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here