Summing by common strings in different files

I have a file file1 with the amount of times a user shows up in the files, something like this:

4 userC
2 userA
1 userB

and I have another file file2 with users and other info like:

userC, degree2
userA, degree1
userB, degree2

and I want an output where it shows the amount of times user shows up, for each degree:

5 degree2
2 degree1

Answer

Pure awk:

$ awk -F'[, ]' 'NR==FNR{n[$2]=$1;next}{m[$3]+=n[$1]}
    END{for(i in m){print i " " m[i]}}' 
    file1 file2
degree1 2
degree2 5

Or you can put it into a script like this:

#!/usr/bin/awk -f 
BEGIN {
    FS="[, ]"
}
{
    if (NR == FNR) {
        n[$2] = $1;
        next;
    } else {
        m[$3] += n[$1];
    }
}
END {
    for (i in m) {
        print i " " m[i];
    }
}

First set field separator to both comma and space (that is the BEGIN block or the -F command line option.

Then, when parsing the first file (the FNR == NR idiom) put number of connections for a user into array indexed by user name. When parsing the following file(s), add the number of connections for each user into the array indexed by user group.

Finally (the END block) scan the whole array and print the key, value pairs.

Leave a Reply

Your email address will not be published. Required fields are marked *