I have multiple tab-separated files with
.cluster extension. I want to classify these files on the basis of their first column content using the following criteria: (
3 are actual digits/content inside the files)
- class_1: Only
3present on successive lines
- class_2: Only
- class_3: Only
I want to write their file names in
.txt files with the respective class name.
How do I do it with shell scripting?
for filename in *.cluster do class=$(cut -d$'t' -f1) # Part 1 if [ $(wc -l "$filename") -eq 2 ] # Part 2, start then class=1 fi # Part 2, end printf '%sn' "$filename" >> class_"$class".txt # Part 3 done
This has three parts:
By default, it classifies the file based on the first field of the only line: the
classvariable is set to whatever is in the file, up to the first tab character on each line. This will be either
3for class 2 & 3, since those files have only one line.
- If the file has two lines (
$(wc -l "$filename") -eq 2), it must be class 1, so the
classvariable is forcibly set to 1, replacing its value from step 1. The
fideals with this.
- Finally, the filename is appended to the appropriate class file:
printf '%sn' "$filename" >> class_"$class".txt
At the end you will have three files
class_N.txt for each N in 1, 2, 3, with one filename per line. If any file has some other contents than what you outlined in the question, like a different first field or length, you will get extra class files created.
In the perverse case where a filename itself contains a newline character, this will fall apart (and give you an opportunity to reconsider your filename choices), but otherwise it should be fine.