techniques

What is RAG (Retrieval-Augmented Generation)?

RAG (Retrieval-Augmented Generation) — Retrieval-Augmented Generation (RAG) is a technique that enhances AI responses by retrieving relevant information from an external knowledge base before generating an answer. Instead of relying solely on training data, RAG systems search a curated dataset, inject the most relevant documents into the prompt context, and generate responses grounded in factual, up-to-date information.

How RAG Works

Query — User asks a question or provides a task
Retrieve — System searches a knowledge base for relevant documents using vector similarity
Augment — Retrieved documents are added to the AI's prompt context
Generate — AI generates a response grounded in the retrieved information

RAG vs Fine-Tuning

Approach	Best For	Tradeoff
RAG	Dynamic knowledge, frequent updates	Requires retrieval infrastructure
Fine-tuning	Behavioral changes, style	Expensive, static knowledge

RAG is generally preferred for knowledge management because it allows real-time updates without retraining the model.

RAG in Quoth

Quoth uses RAG principles in its semantic search: when you query quoth_search_index, it converts your query into a vector embedding, searches the HNSW index for similar patterns, and returns the most relevant matches. These patterns can then be injected into AI context for grounded responses.

How RAG Works

RAG vs Fine-Tuning

RAG in Quoth

Related Terms