extract every nth character from a string

I am trying to figure out a solution for this question. My approach to this problem so far is as below.

  • Append all the characters together to make it a long string.
  • After the above step, remove all the white spaces or tab spaces so that we will just have one big string.

I was able to establish the above steps with the below command.

column -s 't' inputfile | tr -d '[:space:]'

So for an input file like this,

1   0   0   0   0   0

0   1   1   1   0   0

After applying the above command I have the values as,

100000011100

Now in this big string I am trying to apply an approach as below.

Extract every 6th character (as the original OP wants), and append it to an array element till the end of the string.

So basically, with the above step, I am trying to create the array elements as,

10 (1st and 7th character), 01 (2nd and 8th character), 01 (3rd and 9th character), 01 (4th and 10th character), 00 (5th and 11th character), 00 (6th and 12th character) .

So my question is, how could I extract every nth character so that I could add them to an array to proceed further? (n=6, in this case).

Answer

Two lines

Here is a pure-bash solution that produces a bash array:

s="100000011100"
array=($(
    for ((i=0; i<${#s}-6; i++))
    do
        echo "${s:$i:1}${s:$((i+6)):1}"
    done
    ))
echo "${array[@]}"

This produces the same output as shown in the question:

10 01 01 01 00 00

The key element here is the use of bash’s substring expansion. Bash allows the extraction substrings from a variable, say parameter, via ${parameter:offset:length}. In our case, the offset is determined by the loop variable i and the length is always 1.

General Solution For Any Number of Lines

Suppose, for example, that our original string has 18 characters and we want to extract the i-th, the i+6-th, and the i+12-th characters for i from 0 to 5. Then:

s="100000011100234567"
array=($(
    for ((i=0; i<6; i++))
    do
        new=${s:$i:1}
        for ((j=i+6; j<${#s}; j=j+6))
        do 
            new="$new${s:$j:1}"
        done
        echo "$new"
    done
    ))

echo "${array[@]}"

This produces the output:

102 013 014 015 006 007

This same code extends to an arbitrary number of 6-character lines. For example, if s has three lines (18 characters):

s="100000011100234567abcdef"

Then, the output becomes:

102a 013b 014c 015d 006e 007f

Leave a Reply

Your email address will not be published. Required fields are marked *