Tuesday, April 28, 2026

Similarity search, MMR and similarity Score threshold : the popular semantic search algos used for retrieval by RAG pipelines

 In search and information retrieval, a search type of "MMR" stands for Maximal Marginal Relevance

It is an algorithm used to balance relevance (how well a result matches your query) with diversity (how different the results are from one another). 

Why it is used

Standard search methods (like Similarity Search) often return results that are very similar to each other. For example, if you search for "black jackets," a standard search might give you five nearly identical bomber jackets. 
MMR prevents this "echo chamber" by: 
  • Reducing Redundancy: It penalises documents that are too similar to results already selected for the list.
  • Increasing Coverage: It ensures the results cover different aspects or perspectives of a topic.
  • Optimising Context: In AI systems like RAG (Retrieval-Augmented Generation), MMR is used to feed the AI a diverse set of facts rather than the same information repeated multiple times, leading to more comprehensive answers. 

How it works

MMR uses a mathematical formula to rank items one by one. It chooses the next item by calculating a score based on a trade-off parameter called Lambda ($\lambda$)
  • Relevance Component: How much the item matches the query.
  • Diversity Component: How much the item differs from items already picked. 
Lambda ($\lambda$) Value Effect
1.0Pure Relevance: Results are identical to a standard similarity search.
0.5Balanced: A mix of highly relevant and diverse results (often the default).
0.0Pure Diversity: Prioritises different results even if they are less relevant.
This search type is widely supported in modern vector databases and frameworks such as LangChain, OpenSearch, and Qdrant


retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 6, "lambda_mult": 0.5}
)


LangChain's standard VectorStoreRetriever supports three main search algorithms (or "search types") natively. Beyond these, LangChain provides a massive ecosystem of over 40 specialized retrievers that implement more complex algorithms like keyword matching, hybrid search, and multi-step reasoning. 

1. Built-in search_type Options 

When you create a retriever from a vector store using .as_retriever(), you can specify these three primary types: [4]
  • similarity (Default): A standard Top-K search. It uses cosine similarity or Euclidean distance to find the $K$ most similar documents to your query vector.
  • mmr (Maximal Marginal Relevance): As discussed, this balances relevance and diversity to avoid redundant results.
  • similarity_score_threshold: This retrieves documents like a standard similarity search but filters out any that fall below a specific relevance score (set via score_threshold). 

2. Advanced Retrieval Algorithms

For more complex needs, LangChain offers specialized retrievers that go beyond simple vector matching: [8]
  • BM25 Retriever: A "sparse" retriever based on keyword frequency (TF-IDF style), which is often better for finding exact terms rather than abstract meanings.
  • Ensemble Retriever: Combines multiple algorithms (e.g., BM25 + Semantic Search) and reranks the results using a method like Reciprocal Rank Fusion.
  • Self-Query Retriever: Uses an LLM to transform a natural language query into a structured query (e.g., adding metadata filters like "documents from 2023 only") before searching.
  • Multi-Query Retriever: Uses an LLM to generate multiple versions of your question from different perspectives, retrieves results for all of them, and then merges the best findings.
  • Parent Document Retriever: Searches small text chunks (which are easier for algorithms to match) but returns the larger "parent" document to give the LLM more context.
  • Time-Weighted Retriever: Combines semantic similarity with a "time decay" factor, prioritizing more recent documents—perfect for news or logs. 

Summary of Differences

Algorithm Type GoalUse Case
Top-K (Similarity)Pure RelevanceStandard Q&A where you want the closest matches.
MMRRelevance + DiversityAvoiding repetitive information in the context.
Hybrid (Ensemble)Meaning + KeywordsFinding specific technical terms while keeping semantic context.
Self-QueryStructured SearchWhen you have a lot of metadata (dates, authors, tags).
Do you want to know how to combine a keyword search and a vector search into a single Ensemble Retriever?




No comments:

Post a Comment

What is Pydantic

Pydantic Pydantic is a data validation and settings management library for Python. ...