I have a collection of gzipped files that I want to combine into a single file. They each have identical format. I want to keep the header information from only the first file and skip it in the subsequent files.
As a simple example, I have four identical files with the following content:
$ gzcat file1.gz # header 1 2
I want to end up with
# header 1 2 1 2 1 2 1 2
In reality, I can have a varying number of files so I would like to be able to do this programatically. Here is the non-programatic solution I have so far…
cat <(gzcat file1.gz) <(tail -q -n +2 <(gzcat file2.gz) <(gzcat file3.gz) <(gzcat file4.gz))
This command works, but it is “hard coded” to handle four files,
and I need to generalize it for any number of files.
I am using
bash as the shell if that helps. My preference is for performance (in reality the files can be millions of lines long), so I am OK with a less-than-elegant solution if it is speedy.
If the command that you show in your question basically works (for a hard-coded number of files), then
first=1 for f in file*.gz do if [ "$first" ] then gzcat "$f" first= else gzcat "$f"| tail -n +2 fi done > collection_single_file
should work for you.
I hope the logic is fairly clear.
Look at all the files (change the wildcard as appropriate for your file names).
If it’s the first one in the list,
gzcat it, so you get the entire file
(including the header).
tail to strip the header.
After you’ve handled a file, then no other file will be the first.
tail N−1 times, instead of just once (like your answer).
Aside from that, my answer should perform the same as your answer.