The question is published on by Tutorial Guruji team.
I have two files that I am trying to match multiple columns worth of information and output to a new file. I have found several posts that have gotten me to about a 80% solution, and I am looking for help with the final piece of the puzzle. I am currently using a system with Solaris Unix system, and mainly do most of my information manipulation using a ksh shell. The solution that I have gotten is using an awk command. The tables are currently in 4 columns separated with spaces. File 1 is an example of the master file list that I keep, and what I am comparing file 2 with.
File 1 looks something like this (The master file list):
FOO1 BAR1 100 SX2000 FOO1 BAR1 101 SX2001 FOO1 BAR1 102 SX2002 FOO1 BAR1 103 SX2003 FOO1 BAR1 104 SX2004 FOO1 BAR1 105 SX2005 FOO1 BAR1 106 SX2006 FOO1 BAR1 107 SX2007 FOO1 BAR1 108 SX2008 FOO1 BAR1 109 SX2009 FOO1 BAR1 110 SX2010 FOO1 BAR1 111 SX2011 FOO1 BAR1 112 SX2012 FOO1 BAR1 113 SX2013 FOO1 BAR1 114 SX2014 FOO1 BAR1 115 SX2015 FOO1 BAR1 116 SX2016 FOO1 BAR1 117 SX2017 FOO1 BAR1 118 SX2018 FOO1 BAR1 119 SX2019 FOO1 BAR1 120 SX2020
File 2 looks like this (the table I am filling in):
FOO1 BAR1 100 SX FOO1 BAR1 101 SX FOO1 BAR1 102 SX FOO1 BAR1 103 SX FOO1 BAR1 104 SX FOO1 BAR1 105 NV FOO1 BAR1 106 SX FOO1 BAR1 107 SX FOO1 BAR1 108 SX FOO1 BAR1 109 SX FOO1 BAR1 113 SX FOO1 BAR1 114 SX FOO1 BAR1 115 NV FOO1 BAR1 116 SX FOO1 BAR1 117 SX FOO1 BAR1 118 SX FOO1 BAR1 119 SX
What I would like is a result like this:
FOO1 BAR1 100 SX2000 FOO1 BAR1 101 SX2001 FOO1 BAR1 102 SX2002 FOO1 BAR1 103 SX2003 FOO1 BAR1 104 SX2004 FOO1 BAR1 105 NV FOO1 BAR1 106 SX2006 FOO1 BAR1 107 SX2007 FOO1 BAR1 108 SX2008 FOO1 BAR1 109 SX2009 FOO1 BAR1 113 SX2013 FOO1 BAR1 114 SX2014 FOO1 BAR1 115 NV FOO1 BAR1 116 SX2016 FOO1 BAR1 117 SX2017 FOO1 BAR1 118 SX2018 FOO1 BAR1 119 SX2019
As you can see, I would like to compare file 2 with file 1, and print out the value in column 4 if it matches all the way across. The code that I have come up with so far allows me to compare the first 3 columns of both files, and then copy across column 4.
Here is an example of the script that I have so far:
#! /bin/ksh file1=master_table file2=test_table file3=combined_table awk -F' ' '(FNR==NR) {a[$1,$2,$3]=$4; next} (($1,$2,$3) in a) {print $1,$2,$3,a[$1,$2,$3]}' $file1 $file2 > $file3
I am looking for a way to still make the comparisons with the first 3 columns, and also check if the first 2 characters of column 4 match before it copies the information over. Otherwise, I would like to have the original column 4 value copied over from the second file. While I am currently using awk, that does not have to be the final solution. I would just like to keep it within a shell script because it makes it easier for me to do other manipulations with the files.
Thanks in advance for the help!!
Answer
$ cat tst.awk { key = $1 FS $2 FS $3 FS substr($4,1,2) } NR==FNR { a[key]=$4; next } key in a { $4=a[key] } { print }
.
$ awk -f tst.awk file1 file2 FOO1 BAR1 100 SX2000 FOO1 BAR1 101 SX2001 FOO1 BAR1 102 SX2002 FOO1 BAR1 103 SX2003 FOO1 BAR1 104 SX2004 FOO1 BAR1 105 NV FOO1 BAR1 106 SX2006 FOO1 BAR1 107 SX2007 FOO1 BAR1 108 SX2008 FOO1 BAR1 109 SX2009 FOO1 BAR1 113 SX2013 FOO1 BAR1 114 SX2014 FOO1 BAR1 115 NV FOO1 BAR1 116 SX2016 FOO1 BAR1 117 SX2017 FOO1 BAR1 118 SX2018 FOO1 BAR1 119 SX2019