Unlocking the Power of MongoDB Atlas Vector Search with Pre-Filters and Post-Filters

Rajesh Vinayagam
5 min readAug 26, 2024

--

In today’s data-driven world, finding the right information quickly is crucial. MongoDB Atlas Vector Search is a game-changing feature that empowers you to perform advanced similarity searches based on vector embeddings. Whether you’re working with natural language processing, image recognition, or recommendation systems, this powerful tool can help you uncover the most relevant data. But what if you could make your searches even more precise and efficient? Enter pre-filters and post-filters — your secret weapons for refining search results and delivering exactly what your users need.

In this article, we’ll dive into how you can leverage pre-filters and post-filters in MongoDB Atlas Vector Search to supercharge your search capabilities, ensuring that the results are not only accurate but also highly relevant to your specific needs. Let’s explore how these tools can transform the way you interact with your data.

What is a Vector Search in MongoDB Atlas?

A vector search in MongoDB Atlas allows you to find documents that are most similar to a given query based on vector representations (embeddings) of text, images, or other data. This is done using a special index called a vector index. To perform a vector search, you need to have a collection with a vector index and documents containing vector data.

Setting Up a Vector Index

Before we dive into the examples of pre-filters and post-filters, let’s start with setting up a vector index in MongoDB Atlas. Suppose we have a collection called products where each document represents a product description with its respective vector embedding.

Sample Document in the products Collection:

{
"_id": "1",
"name": "Ergonomic Chair",
"description": "A comfortable ergonomic chair with lumbar support.",
"embedding": [0.013, -0.045, 0.075, ...], // Vector representation of the description
"tags": {
"category": "Furniture",
"sub_category": "Office Chairs",
"material": "Mesh",
"brand": "ComfortCo"
},
"price": 199.99,
"last_modified": "2024-08-21T06:48:05"
}

Creating a Vector Index:

To create a vector index, you would typically define an index on the embedding field in MongoDB Atlas Search. The vector index uses the vectorSearch operator to perform similarity searches. Here's how you can define a vector index with filters:

{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536, // Assuming a 1536-dimensional vector embedding
"similarity": "cosine" // Using cosine similarity for measuring similarity
},
{
"type": "filter",
"path": "tags.category"
},
{
"type": "filter",
"path": "tags.sub_category"
},
{
"type": "filter",
"path": "tags.material"
},
{
"type": "filter",
"path": "tags.brand"
}
]
}

Understanding Pre-Filters and Post-Filters

The filtering process in MongoDB Atlas Vector Search can be performed either before or after the vector search itself, and each approach has its own challenges that may impact query performance:

  • Pre-Filtering: In this approach, metadata filtering is done before the vector search. This helps reduce the search space, potentially speeding up the vector search by limiting the number of documents that need to be compared. However, it might cause the system to overlook relevant results that don’t match the pre-filter criteria, and extensive metadata filtering may introduce additional computational overhead, slowing down the query process.
  • Post-Filtering: In this approach, metadata filtering is applied after the vector search is performed. This ensures that all relevant results are considered in the initial search, which can be particularly useful when you’re unsure about the filtering criteria or when you want to include a broader set of results. However, this method may introduce additional overhead, as irrelevant results need to be filtered out after the search is complete, which can slow down the query process.

Using Pre-Filters in MongoDB Atlas Vector Search

Pre-filters are used to filter documents before the vector search is executed. This means you can narrow down the search space based on specific criteria, such as category, brand, or price range, before calculating the similarity.

Example of Using Pre-Filters:

Suppose you want to find products similar to the query “comfortable office chair” but only within the “Office Chairs” sub-category.

pre_filter = {
"tags.sub_category": "Office Chairs"
}

query = "comfortable office chair"
results = vectorstore.similarity_search(query=query, k=5, pre_filter=pre_filter)

# Display results
for result in results:
print("Product Name: " + result.name)
print("Description: " + result.description)
print("Metadata: ", result.metadata)

In this example, only documents where tags.sub_category is "Office Chairs" will be considered for the vector search, thus refining the results to be more relevant to the user's needs.

Using Post-Filters in MongoDB Atlas Vector Search

Post-filters allow you to further filter or sort the results after the vector search is performed. This can be useful when you want to apply additional filtering criteria based on the search results or sort them based on certain fields like price or last modification date.

Example of Using Post-Filters:

Suppose you want to perform a vector search on all products but only return results that are in the “Furniture” category and sort them by price in ascending order.

post_filter_pipeline = [
{
"$match": {
"tags.category": "Furniture"
}
},
{
"$sort": {
"price": 1 # Sort by price in ascending order
}
}
]


query = "comfortable office chair"
results = vectorstore.similarity_search(query=query, k=5, post_filter_pipeline=post_filter_pipeline)
# Display results
for result in results:
print("Product Name: " + result.name)
print("Description: " + result.description)
print("Price: ", result.metadata['price'])
print("Metadata: ", result.metadata)

Here, the vector search is first executed on all documents. Then, the results are filtered to include only documents in the “Furniture” category and sorted by price.

Combining Pre-Filters and Post-Filters

You can also combine both pre-filters and post-filters to achieve more refined search results. For example, you can filter documents to a specific sub-category before the vector search and then sort the results by price after the search.

pre_filter = {
"tags.sub_category": "Office Chairs"
}

post_filter_pipeline = [
{
"$sort": {
"price": 1 # Sort by price in ascending order
}
}
]

query = "comfortable office chair"
results = vectorstore.similarity_search(query=query, k=5, pre_filter=pre_filter, post_filter_pipeline=post_filter_pipeline)
# Display results
for result in results:
print("Product Name: " + result.name)
print("Description: " + result.description)
print("Price: ", result.metadata['price'])
print("Metadata: ", result.metadata)

Conclusion

Using pre-filters and post-filters in MongoDB Atlas Vector Search enables you to refine your search results by applying conditions before and after the vector search. This not only improves the relevance of the search results but also enhances the performance by reducing the number of documents processed during the vector search.

By leveraging these filters, you can build more efficient and effective search capabilities in your applications, catering to specific user needs and improving the overall user experience. With MongoDB Atlas, mastering vector searches has never been more accessible or impactful.

--

--

No responses yet