Retrieval-Augmented Generation (RAG)(Retrieval-Augmented Generation)
Retrieval-Augmented Generation (RAG) integrates external knowledge retrieval into the LLM generation process.
Overview
For a given user query, RAG first retrieves relevant documents, then includes them in the prompt sent to the LLM. This mitigates training-time knowledge cutoffs and reduces hallucinations.
Key components
- Retriever: Fetches relevant documents from a vector DB / search engine
- Generator: An LLM that produces the final answer given retrieved context and the query
- Embeddings: Maps documents and queries into a shared vector space