Distinguishing files on the basis of first column’s content Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Distinguishing files on the basis of first column’s content without wasting too much if your time.

The question is published on by Tutorial Guruji team.

I have multiple tab-separated files with .cluster extension. I want to classify these files on the basis of their first column content using the following criteria: (2 and 3 are actual digits/content inside the files)

  • class_1: Only 2 AND 3 present on successive lines
  • class_2: Only 2 present
  • class_3: Only 3 present

I want to write their file names in .txt files with the respective class name. How do I do it with shell scripting?


for filename in *.cluster
    class=$(cut -d$'t' -f1)                      # Part 1
    if [ $(wc -l "$filename") -eq 2 ]             # Part 2, start
    fi                                            # Part 2, end
    printf '%sn' "$filename" >> class_"$class".txt # Part 3

This has three parts:

  1. By default, it classifies the file based on the first field of the only line: the class variable is set to whatever is in the file, up to the first tab character on each line. This will be either 2 or 3 for class 2 & 3, since those files have only one line.

    cut chops files up by delimiters, $'t' is a Bash way of writing a tab character, and -f1 asks cut to output only the first delimited field.

  2. If the file has two lines ($(wc -l "$filename") -eq 2), it must be class 1, so the class variable is forcibly set to 1, replacing its value from step 1. The iffi deals with this.
  3. Finally, the filename is appended to the appropriate class file: printf '%sn' "$filename" >> class_"$class".txt

At the end you will have three files class_N.txt for each N in 1, 2, 3, with one filename per line. If any file has some other contents than what you outlined in the question, like a different first field or length, you will get extra class files created.

In the perverse case where a filename itself contains a newline character, this will fall apart (and give you an opportunity to reconsider your filename choices), but otherwise it should be fine.

We are here to answer your question about Distinguishing files on the basis of first column’s content - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji