What does it mean “update the doc without modify the source” in Elasticsearch

In this link update by query. I don’t understand this part:

If no query is specified, performs an update on every document in the index without modifying the source

Are the document in the index and the source two distinct entities. Does that mean if the document is updated and the source is unchanged then these 2 entities are out of sync. If that’s the case then how does es handle the un-sync changes?

Any clarification or reference is much appreciated.

Answer

Update-by-query has two main usages:

A. You can modify the source document in place (adding one field, modifying another field, etc) without having to reindex them from your source of truth repository. That’s what the script part is for in the query below:

POST my-index-000001/_update_by_query
{
  "script": {
    "source": "ctx._source.count++",         <--- this modifies your source documents
    "lang": "painless"
  },
  "query": {
    "term": {
      "user.id": "kimchy"
    }
  }
}

B. If you perform a mapping change (e.g. adding a keyword sub-field to a text field), however, you will need to update your index in order to pick up your mapping change. Instead of reindexing all the data from your source of truth, you can simply call _update_by_query without any script on your index in order to pick up the mapping change and make sure the underlying indexed data is updated (i.e. the new keyword sub-field is indexed). What happens is that each source document will be reindexed on itself (without any changes to its structure) to account for the mapping change.

POST my-index-000001/_update_by_query

Leave a Reply

Your email address will not be published. Required fields are marked *