rschandrastechblog: RAG Pipeline

Wednesday, May 6, 2026

RAG Pipeline

Documents

↓

Embeddings

↓

Vector DB (e.g., FAISS)

↓

Query → embedding

↓

Similarity search

↓

Top-k chunks

↓

LLM

↓

Answer

Retrieval-Augmented Generation (RAG) pipeline is divided into two main phases: the Ingestion phase (where data is prepared) and the Retrieval/Generation phase (where the user interacts with the data).

RAG Workflow Architecture

Ingestion Phase (Offline)
- Documents: Raw data sources like PDFs or text files are loaded.
- Embeddings: The text is broken into smaller pieces (chunks) and converted into numerical vectors using an embedding model.
- Vector DB (e.g., FAISS): These vectors are stored in a specialized database that allows for fast mathematical searching.
Inference Phase (Online)
- Query → Embedding: When you ask a question, it is converted into a vector using the same embedding model used for the documents.
- Similarity Search: The system compares the query vector against the vectors in the database to find the closest matches.
- Top-k Chunks: The most relevant "k" pieces of text are retrieved from the database.
- LLM → Answer: These chunks are sent to a Large Language Model (like GPT-4) as context, allowing it to generate an answer based on your specific documents.

Visual Workflow Diagram

Below is a representation of how these components connect:

rschandrastechblog

Wednesday, May 6, 2026

RAG Pipeline

No comments:

Post a Comment

JOURNEY - 003

Report Abuse

Followers