Description:
I'm working on an application that uses Kafka as middleware for message passing, Elasticsearch for advanced search, and currently evaluating integrating MongoDB as the primary database. Currently, the application only uses Elasticsearch for data storage and indexing, but I'm experiencing data consistency issues, as it's hard to handle atomic updates with elasticsearch.
Current Data Consistency Issue:
Currently, the application only uses Elasticsearch for data storage, but I'm experiencing data consistency issues across the various components of the application. The lack of a primary and consistent datastore is leading to data discrepancies and a lack of synchronization between various operations.
Example MongoDB Collection Structure:
experiences { id, title, markets(list), translations(list), // other properties... }
Objective:
When a document in the experiences collection in MongoDB is modified, I want to automatically update the corresponding index in Elasticsearch to reflect the change. However, I'm experiencing data consistency issues and I'm looking to evaluate if introducing MongoDB as the primary database can solve this problem.
Questions:
How can I properly configure a Change Stream in MongoDB to monitor changes in the experiences collection and synchronize them with Elasticsearch?I have found this doc https://www.mongodb.com/docs/manual/changeStreams/#:~:text=Change%20streams%20allow%20applications%20to,and%20immediately%20react%20to%20them.
What are the best practices for implementing this real-time synchronization between MongoDB and Elasticsearch, considering also the use of Kafka for messaging?
Considering the use of Kafka, MongoDB, and Elasticsearch together, what are the considerations to ensure data consistency across the different components of the application?
I've experimented with using MongoDB Atlas, but I've encountered some limitations regarding search, particularly with highlights