How to Build a Local Open-Source LLM Chatbot With RAG | by Dr. Leon Eversberg | Mar, 2024

April 1, 2024

Large Language Models (LLMs) are remarkable at compressing knowledge about the world into their billions of parameters.

However, LLMs have two major limitations: They only have up-to-date knowledge up to the time of the last training iteration. And they sometimes tend to make up knowledge (hallucinate) when asked specific questions.

Using the RAG technique, we can give pre-trained LLMs access to very specific information as additional context when answering our questions.

In this article, I will walk through the theory and practice of implementing Google’s LLM Gemma with additional RAG capabilities using the Hugging Face transformers library, LangChain, and the Faiss vector database.

An overview of the RAG pipeline is shown in the figure below, which we will implement step by step.

An overview of the RAG pipeline. For documents storage: input documents -> text chunks -> encoder model -> vector database. For LLM prompting: User question -> encoder model -> vector database -> top-k relevant chunks -> generator LLM model. The LLM then answers the question with the retrieved context. — Overview of the RAG pipeline implementation. Image by author

How to Build a Local Open-Source LLM Chatbot With RAG | by Dr. Leon Eversberg | Mar, 2024

Recent Articles

Why the Newest LLMs use a MoE (Mixture of Experts) Architecture

Using Machine Learning in Customer Segmentation

NYT ‘Connections’ hints and answers for July 27: Tips to solve ‘Connections’ #412.

Crooks Bypassed Google’s Email Verification to Create Workspace Accounts, Access 3rd-Party Services – Krebs on Security

🤖 The AI Developer’s Toolkit: Essential Skills and Resources [2023 Edition] 🔧 | by Jett Black | Jul, 2024

Related Stories

Leave A Reply Cancel reply