Process data after reducing in hadoop

I have a text file:

A 1
A 4
A 2
B 8
B 1

I want to calculate the average of maxA(4) and maxB(8). First, in the mapper, I pass data to reducer by their keys, and in reducer, I find the max value of that key. But how can I calculate the average of them after reducing?


If you only have the mapper output the maximum value of the key, then you will not be able to retrieve the average of the key from the output of the reducer. There is simply not enough information.

Either take the average during the reducer’s processing and output it along with the maximum (probably separated by some delimiter for ease of parsing), or run another map-reduce job in order to calculate the average.

Leave a Reply

Your email address will not be published. Required fields are marked *