Essential metrics and methods to enhance performance across retrieval, generation, and end-to-end pipelines
Introduction
When we think of some of the most common applications of Generative AI, Retrieval-Augmented Generation (RAG) has without a doubt surfaced to become of the most common topics of discussion within this domain. Unlike traditional search engines that relied on optimizing retrieval mechanisms using keyword searches to find relevant information for a given query, RAG goes a step further in generating a well-rounded answer for a given question using the retrieved content.
The figure below illustrates a graphical representation of RAG in which documents of interest are encoded using an embedding model, and are then indexed and stored in a vector store. When a query is submitted, it is generally embedded in a similar manner, followed by two steps (1) the retrieval step that searches for similar documents, and then (2) a generative step that uses the retrieved content to synthesize a response.