How do I use prefilters with langchain and mongodb atlas?
Here's my example models:
class Question(Document): content = StringField(required=True)class ResponseFragment(Document): question = ReferenceField(Question, required=True) text = StringField(required=True)
Note: it's referencing the Question.
I have ResponseFragment.text
set up with vector search.
But before I want to do a vector search, I want to prefilter by Response.question
field, so as to reduce the performance impact.
Here is my function:
def get_relevant_docs(query, store="theme", k=100, pre_filter_dict={}, as_dataframe=True, index_name="default", threshold=0.91): # Get the MongoDB collection collection = self.mongo_client[store] # Filter the documents based on the 'question' field question_id = str(ObjectId(self.question.id)) # Convert ObjectId to string pre_filter_dic = {"question": question_id} # Create the MongoDBAtlasVectorSearch instance using the filtered documents vectorstore = MongoDBAtlasVectorSearch(collection, self.embeddings, text_key="text", embedding_key="embedding", index_name=index_name) # Perform similarity search on the filtered documents docs = vectorstore.similarity_search_with_score(query, k=k, pre_filter=pre_filter_dict)
Running this without the prefilter, it works fine (though does vector search on all documents instead of filtered ones).
Running this WITH the question filter, I get this error:
Error: PlanExecutor error during aggregation :: caused by :: Path 'question' needs to be indexed as token, full error: {'ok': 0.0, 'errmsg': "PlanExecutor error during aggregation :: caused by :: Path 'question' needs to be indexed as token", 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1714675372, 2), 'signature': {'hash': b'\x08{\xaaU\xd9\xbb\xa1\xbf\xa8\x01p\xa4\xc8b\xbb\xef\xa9\xcb\xee\x05', 'keyId': 7327355885062193166}}, 'operationTime': Timestamp(1714675372, 2)}
I'm assuming the issue is that I need to add an index to Mongo Atlas, but where and how do I do that?
I've tried doing this from my python code as...
db.collection.createIndex({ question: "text" })
...with no luck.
Any help would be MUCH appreciated.