I’m extremely new to Elasticsearch and I can’t seem to find an answer that will help me to get Python to detect if the data from the documents I have in an s3 bucket are already uploaded in elasticsearch. My goal is to have it see if the data from the s3 bucket is already in there, if it is then skip it, and move onto the next one until it finds a document that has data not uploaded yet. Can someone help me, please?
I think the easiest way would be to use DynamoDB to store that kind of information. So each file that you upload to ES, gets a record in DDB. Thus you can always verify if the file had been uploaded to ES, by checking for the presence/absence of records in DDB.