I’ve got some problems I’m not capable of overcoming. I need to count the first let’s say N words in a text file. Then, I have to print them in decreasing order, followed by the number of occurrences.The words must be sorted alphabetically.

As an example , if I have 6 occurrences of word “a” , 5 of word “b”, 5 of word c and n is given as 2, I’ll print:

a 6

b 5

If I have 10 occurrences of word “la” , 5 of word “hi” , 5 of “zzz” and 5 of “arr”, and n given as 3 , I’ll print:

la 10

arr 5

hi 5

(the zzz is omitted intentionally).

The problem is that my script (which is below) only prints one word of each number of occurrences.

tr  [:space:] 'n' <$1| uniq -c|sort -rnuk1,1|awk '{print $2,$1}'|head -n

As an extra feature, I’d like the script to search number of occurrences of words in the first m lines of file.


Your use of tr is clever. But you need to sort before you use uniq, because uniq only looks at adjacent lines. So we have

cat file.txt | sort | uniq -c | sort -r | awk '{print $2, $1}' | head -n 10

Also as you can see the use of -k and -n for sort is unnecessary in this case (though not wrong).

