How to match multiple columns between two files?

I have two files that I am trying to match multiple columns worth of information and output to a new file. I have found several posts that have gotten me to about a 80% solution, and I am looking for help with the final piece of the puzzle. I am currently using a system with Solaris Unix system, and mainly do most of my information manipulation using a ksh shell. The solution that I have gotten is using an awk command. The tables are currently in 4 columns separated with spaces. File 1 is an example of the master file list that I keep, and what I am comparing file 2 with.

File 1 looks something like this (The master file list):

FOO1 BAR1 100 SX2000  
FOO1 BAR1 101 SX2001  
FOO1 BAR1 102 SX2002  
FOO1 BAR1 103 SX2003  
FOO1 BAR1 104 SX2004  
FOO1 BAR1 105 SX2005  
FOO1 BAR1 106 SX2006  
FOO1 BAR1 107 SX2007  
FOO1 BAR1 108 SX2008  
FOO1 BAR1 109 SX2009  
FOO1 BAR1 110 SX2010  
FOO1 BAR1 111 SX2011  
FOO1 BAR1 112 SX2012  
FOO1 BAR1 113 SX2013  
FOO1 BAR1 114 SX2014  
FOO1 BAR1 115 SX2015  
FOO1 BAR1 116 SX2016  
FOO1 BAR1 117 SX2017  
FOO1 BAR1 118 SX2018  
FOO1 BAR1 119 SX2019  
FOO1 BAR1 120 SX2020  

File 2 looks like this (the table I am filling in):

FOO1 BAR1 100 SX  
FOO1 BAR1 101 SX  
FOO1 BAR1 102 SX  
FOO1 BAR1 103 SX  
FOO1 BAR1 104 SX  
FOO1 BAR1 105 NV  
FOO1 BAR1 106 SX  
FOO1 BAR1 107 SX  
FOO1 BAR1 108 SX  
FOO1 BAR1 109 SX  
FOO1 BAR1 113 SX  
FOO1 BAR1 114 SX  
FOO1 BAR1 115 NV  
FOO1 BAR1 116 SX  
FOO1 BAR1 117 SX  
FOO1 BAR1 118 SX  
FOO1 BAR1 119 SX  

What I would like is a result like this:

FOO1 BAR1 100 SX2000  
FOO1 BAR1 101 SX2001  
FOO1 BAR1 102 SX2002  
FOO1 BAR1 103 SX2003  
FOO1 BAR1 104 SX2004  
FOO1 BAR1 105 NV  
FOO1 BAR1 106 SX2006  
FOO1 BAR1 107 SX2007  
FOO1 BAR1 108 SX2008  
FOO1 BAR1 109 SX2009  
FOO1 BAR1 113 SX2013  
FOO1 BAR1 114 SX2014  
FOO1 BAR1 115 NV  
FOO1 BAR1 116 SX2016  
FOO1 BAR1 117 SX2017  
FOO1 BAR1 118 SX2018  
FOO1 BAR1 119 SX2019  

As you can see, I would like to compare file 2 with file 1, and print out the value in column 4 if it matches all the way across. The code that I have come up with so far allows me to compare the first 3 columns of both files, and then copy across column 4.

Here is an example of the script that I have so far:

#! /bin/ksh

file1=master_table
file2=test_table
file3=combined_table

awk -F' ' '(FNR==NR) {a[$1,$2,$3]=$4; next} (($1,$2,$3) in a) {print $1,$2,$3,a[$1,$2,$3]}' $file1 $file2 > $file3

I am looking for a way to still make the comparisons with the first 3 columns, and also check if the first 2 characters of column 4 match before it copies the information over. Otherwise, I would like to have the original column 4 value copied over from the second file. While I am currently using awk, that does not have to be the final solution. I would just like to keep it within a shell script because it makes it easier for me to do other manipulations with the files.

Thanks in advance for the help!!

Answer

$ cat tst.awk
{ key = $1 FS $2 FS $3 FS substr($4,1,2) }
NR==FNR { a[key]=$4; next }
key in a { $4=a[key] }
{ print }

.

$ awk -f tst.awk file1 file2
FOO1 BAR1 100 SX2000
FOO1 BAR1 101 SX2001
FOO1 BAR1 102 SX2002
FOO1 BAR1 103 SX2003
FOO1 BAR1 104 SX2004
FOO1 BAR1 105 NV
FOO1 BAR1 106 SX2006
FOO1 BAR1 107 SX2007
FOO1 BAR1 108 SX2008
FOO1 BAR1 109 SX2009
FOO1 BAR1 113 SX2013
FOO1 BAR1 114 SX2014
FOO1 BAR1 115 NV
FOO1 BAR1 116 SX2016
FOO1 BAR1 117 SX2017
FOO1 BAR1 118 SX2018
FOO1 BAR1 119 SX2019

Leave a Reply

Your email address will not be published. Required fields are marked *