split file into multiple pieces

Problem: given a file samplein, it can be split up into multiple pieces as follows:

$ cat samplein
START
Unix
Linux
START
Solaris
Aix
SCO

$ awk '/START/{x="F"++i;}{print > x}' samplein
$ ls F*
F1  F2

$ cat F1
START
Unix
Linux

$ cat F2
START
Solaris
Aix
SCO

The above was recipe 5 from this page.
However, I had the case where the pattern (START in this case) didn’t occur at the first line.

But if we append a newline to samplein the same code/recipe doesn’t work any more!

$ echo -e "firstlinen$(cat samplein)" > samplein
$ cat samplein
$ awk '/START/{x="F"++i;}{print > x}' samplein
awk: cmd. line:1: (FILENAME=samplein FNR=1) fatal: expression for `>' redirection has null string value

Please also explain in the answer how this awk command works in the first place. The only context I had used awk previously was {BEGIN}{loop over all lines}{END}. This recipe looks slightly different from that!

Answer

Just add x="F0" to the beginning so the target file is always defined, even if the first line doesn’t contain the pattern:

awk 'BEGIN { x="F0" ; } /START/{x="F"++i;}{print > x}' 

The above breaks down to this pseudo code:

### -> BEGIN { x="F0" ; }
i=0 # implicit
x="F0" # explicit
loop through file

### -> /START/{x="F"++i;}
if ( line contains "START" ) output file is F(next i value) ;

### -> {print > x}
print line to output file

endloop

Keep in mind that all clauses like BEGIN, END , { ...} are optional.

Leave a Reply

Your email address will not be published. Required fields are marked *