How do I use elasticsearch wikipedia river?

This might be a silly question, but I cannot find any information about this. I found the github but after I use the command nothing happened.

So, after I installed the plugin, it worked fine, I used this command to create index.

curl -XPUT localhost:9200/_river/my_river/_meta -d '
{
    "type" : "wikipedia",
    "wikipedia" : {
        "url" : "http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2"
    },
    "index" : {
        "name" : "my_index",
        "type" : "my_type",
        "bulk_size" : 100
    }
}
'

I got this back.

{"ok":true,"_index":"_river","_type":"my_river","_id":"_meta","_version":3}

which also worked fine. But it didn’t do anything apart from that. I’m expecting the plugin to download and index the document and make it searchable. It seems like I have to do some other command but I cannot find it anywhere. I’m new to elasticsearch by the way. Please help.

Edit: Log

[2013-03-26 20:17:14,864][INFO ][gateway                  ] [Captain Universe] recovered [0] indices into cluster_state
[2013-03-26 20:18:48,860][INFO ][cluster.metadata         ] [Captain Universe] [_river] creating index, cause [auto(index api)], shards [1]/[1], mappings []
[2013-03-26 20:18:49,266][INFO ][cluster.metadata         ] [Captain Universe] [_river] update_mapping [wikipedia] (dynamic)
[2013-03-26 20:18:49,293][INFO ][river.wikipedia          ] [Captain Universe] [wikipedia][wikipedia] creating wikipedia stream river for [http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2]
[2013-03-26 20:18:49,294][INFO ][river.wikipedia          ] [Captain Universe] [wikipedia][wikipedia] starting twitter stream
[2013-03-26 20:18:49,329][INFO ][cluster.metadata         ] [Captain Universe] [wikipedia] creating index, cause [api], shards [5]/[1], mappings []
[2013-03-26 20:18:49,632][INFO ][cluster.metadata         ] [Captain Universe] [_river] update_mapping [wikipedia] (dynamic)
[2013-03-26 20:18:54,871][INFO ][cluster.metadata         ] [Captain Universe] [wikipedia] update_mapping [page] (dynamic)

Answer

Judging from "_version":3 you updated the _meta document several times. Unfortunately, some rivers don’t pickup changes in meta doc unless you recreate the type.

Try deleting the river (type in the _river index) using

curl -XDELETE localhost:9200/_river/my_river

command and creating the _meta document again. This will create type my_river again, which will trigger creation of river with new parameters. This problem occurred you initially created the river with wrong parameters and your correct parameter have no effect.

Leave a Reply

Your email address will not be published. Required fields are marked *