I am working with a large dataset (several thousand documents) and I am trying to construct an Atlas Lucene search index for a particular field in these documents. To give an idea of my data, here's a simplified version of my documents:
{ name:'XYZ', lastUpdated: date 1, fundamentalData:{ description: stuff, latUpdatedFA: date 2, ...a lot more data }, performanceData:[a lot of nested objects], otherPerformanceData:[more nested objects], ... more descriptive data}
The issue arises when I attempt to form a straightforward search index on the fundamentalData.description field. The system constantly returns a failure message stating:
'Your index could not be built: Unexpected error: DocValuesField "$type:date/lastUpdated" appears more than once in this document (only one value is allowed per field)'
This error suggests that the 'lastUpdated' field is duplicated in a document. However, I've verified using Python that this isn't the case. (see code at the end)
As a side note, I have a field fundamentalData.lastUpdatedFA which is structurally similar to lastUpdated, but I've confirmed that this should not be an issue as long as the names are not identical. I even performed an updateMany, changing the name of that field to something completely different. No luck
Interestingly, when I build the search index in a conventional way with db.collection.createIndex( { fundamentalData.description: "text" } )
, everything operates as expected. I'm aware that the Atlas Search algorithm differs significantly from the legacy createIndex method, but I'm not sure how it's affecting my case here.
I would appreciate any insights or suggestions. Thanks!
Logan
def find_duplicate_lastUpdated(collection): duplicate_lastUpdated_docs = [] for doc in collection.find(): lastUpdated_count = str(doc).count("'lastUpdated'") if lastUpdated_count > 1: duplicate_lastUpdated_docs.append(doc['name']) time.sleep(0.01) # sleep for 10 milliseconds return duplicate_lastUpdated_docscollection = db["assetdatas"]duplicates = find_duplicate_lastUpdated(collection)len(duplicates)# for name in duplicates:# print(name)