Sort words in file

I’ve got some problems I’m not capable of overcoming. I need to count the first let’s say N words in a text file. Then, I have to print them in decreasing order, followed by the number of occurrences.The words must be sorted alphabetically.

As an example , if I have 6 occurrences of word “a” , 5 of word “b”, 5 of word c and n is given as 2, I’ll print:

a 6

b 5

If I have 10 occurrences of word “la” , 5 of word “hi” , 5 of “zzz” and 5 of “arr”, and n given as 3 , I’ll print:

la 10

arr 5

hi 5

(the zzz is omitted intentionally).

The problem is that my script (which is below) only prints one word of each number of occurrences.

tr  [:space:] 'n' <$1| uniq -c|sort -rnuk1,1|awk '{print $2,$1}'|head -n

As an extra feature, I’d like the script to search number of occurrences of words in the first m lines of file.

Answer

Your use of tr is clever. But you need to sort before you use uniq, because uniq only looks at adjacent lines. So we have

cat file.txt | sort | uniq -c | sort -r | awk '{print $2, $1}' | head -n 10

Also as you can see the use of -k and -n for sort is unnecessary in this case (though not wrong).

Leave a Reply

Your email address will not be published. Required fields are marked *