Using a bash array in an awk and also quoted variable: conflicting syntax issue Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Using a bash array in an awk and also quoted variable: conflicting syntax issue without wasting too much if your time.

The question is published on by Tutorial Guruji team.

I have a script the aim of which is:

  • For a list of files, obtain a specific number for each file (this is sequencing data, to be specific) and store these into array1
  • Using array1, find the smallest number is array1
  • Based on smallest number in array1, divide all it by all numbers in array1 to make array2.

My script looks as follows:

#!/usr/bin/bash



USAGE() { echo "Usage: bash $0 [-b <in-bam-files-dir>] [-o <out-dir>] [-c <chromlen>]" 1>&2; exit 1; }

if (($# == 0))
then
    USAGE
fi



while getopts ":b:o:c:h" opt
do
    case $opt in
        b ) BAMFILES=$OPTARG
        ;;
        o ) OUTDIR=$OPTARG
        ;;
        c ) CHROMLEN=$OPTARG
        ;;
        h ) USAGE
        ;;
        ? ) echo "Invalid option: -$OPTARG exiting" >&2
        exit
        ;;
        : ) echo "Option -$OPTARG requires an argument" >&2
        exit
        ;;
    esac
done



if [ ! -d ${OUTDIR} ]
then
    mkdir ${OUTDIR}
fi

if [ ! -d ${OUTDIR}/temp ]
then
    mkdir ${OUTDIR}/temp
fi

if [ -d ${BAMFILES} ]
then
    echo -e "nProcessing BAM files from following directory: ${BAMFILES} n "
fi



module purge
module load samtools
module load bedtools
module load ucsctools
echo -e "Modules are loadedn"



FIRSTBAM=$(ls $BAMFILES/*bam | head -1)
MIN=$(samtools view -c -F 260 ${FIRSTBAM} )
echo -e "Minimum number of reads is currently set to $MIN from $FIRSTBAM (first bam in directory)n"



declare -A BAMREADS
echo "BAMREADS array is initialized"

for i in $(ls $BAMFILES/*bam)
do
    echo "Counting reads in $i "
    BAMREADS[$i]=$(samtools view -c -F 260 $i)
done



for i in ${BAMREADS[@]}
do
    if [[ $i -lt $MIN ]]
    then
        MIN=$i
    fi
done

echo -e "Minimum number of reads that will be used for scaling is $MIN n"



declare -A BAMFRACS
echo -e "BAMFRACS array is initialized"

for i in ${!BAMREADS[@]}
do
    BAMFRACS[$i]=$(awk -v var1=${MIN} -v var2=${BAMREADS[$i]} 'BEGIN { x= var1 / var2; printf "%.8f", x }')
done



for i in $(ls $BAMFILES/*bam)
do

    SAMPLE=`basename $i`
    SAMPLE=${SAMPLE%.bam}
    echo $SAMPLE

    if [[ ${BAMREADS[$i]} -eq $MIN ]]
    then

        echo "Sample $i does not need scaling"

        command="cp $i ${OUTDIR}/temp/${SAMPLE}.scaled.bam;
        genomeCoverageBed -bg -split -ibam ${OUTDIR}/temp/${SAMPLE}.scaled.bam > ${OUTDIR}/temp/${SAMPLE}.bedgraph;
        sed -e 's/^/chr/g;s/MT/M/g' ${OUTDIR}/temp/${SAMPLE}.bedgraph > ${OUTDIR}/temp/${SAMPLE}.modified.bedgraph;
        sort -k1,1 -k2,2n ${OUTDIR}/temp/${SAMPLE}.modified.bedgraph > ${OUTDIR}/temp/${SAMPLE}.sorted.bedgraph;
        bedGraphToBigWig ${OUTDIR}/temp/${SAMPLE}.sorted.bedgraph $CHROMLEN ${OUTDIR}/${SAMPLE}.bw"
        #rm ${OUTDIR}/temp/${SAMPLE}.*

    else

        command="samtools view -s ${BAMFRACS[$i]} -b $i > ${OUTDIR}/temp/${SAMPLE}.scaled.bam;
        genomeCoverageBed -bg -split -ibam ${OUTDIR}/temp/${SAMPLE}.scaled.bam > ${OUTDIR}/temp/${SAMPLE}.bedgraph;
        sed -e 's/^/chr/g;s/MT/M/g' ${OUTDIR}/temp/${SAMPLE}.bedgraph > ${OUTDIR}/temp/${SAMPLE}.modified.bedgraph;
        sort -k1,1 -k2,2n ${OUTDIR}/temp/${SAMPLE}.modified.bedgraph > ${OUTDIR}/temp/${SAMPLE}.sorted.bedgraph;
        bedGraphToBigWig ${OUTDIR}/temp/${SAMPLE}.sorted.bedgraph $CHROMLEN ${OUTDIR}/${SAMPLE}.bw"
        #rm ${OUTDIR}/temp/${SAMPLE}.*

    fi

    echo $command | qsub -V -cwd -o $OUTDIR -e $OUTDIR -l tmem=10G -l h_vmem=10G -l h_rt=3600 -N bigwig_${SAMPLE}

 done

 echo "Task completed: conversion jobs submitted to cluster"

I have 2 questions:

  • From what I understand, bash is not very good at doing arithmetic maths: i.e. doing any kind of operation (addition, division etc) involving float numbers. However, given the fact that var1 and var2 are always integers in my script (see $MIN and all array1 values), do we agree that this is not a problem? I.e. my operation results in float numbers, but it uses integers, so it’s not a problem right?

  • It’s not very clear in StackExchange because there is no syntax highlighting here, but I noticed that the var2=${BAMREADS[$i]} part of my script isn’t quite right. I use nano and in my terminal, instead of having all of the ${BAMREADS[$i]} in red, like the other variables (like ${MIN}), only the ${BAMREADS[$i part of the script is appearing in red, i.e. the ending ]} are not red. The script seems to be behaving as I expect and everything seems to be working.. So I don’t quite understand why it’s not all in red.

This is how my script looks like in nano (notice how the ]} in ${BAMREADS[$i]} in the awk command AND later on in the second $command is not in red as it should be):

enter image description here

However, if you paste this code into https://www.shellcheck.net/, you don’t get any problem in terms of highlighting in this part of the script. So how come nano and shellcheck are not telling me the same thing? I have used this script and it seems to work for me but I am concerned by this highlighting issue..

Thanks

Answer

Syntax highlighting is one problem

Each editor has its own dis-/advantages in this manner.

See my question on SoftwareRecs and its respective answers, most importantly this for both CLI and GUI and this for GUI.

Notably, Visual Studio Code, has IMHO the best Syntax highlighting from GUI editors.

From CLI editors, refer to the answer of gVim, which does the same syntax highlighting job for CLI.

Note that as I was a heavy nano user, I can tell you nano can’t distinguish the variables inside quotes.


Missing double quotes are a bigger problem

What should trouble you most is that you did not use – I suppose you’re not used to it – bad habit – the double quotes. Please refer to StackOverflow for more information or simply use Google. Or see below.


Double quote to prevent globbing and word splitting

For shell script writers, nano editor is hardly usable, because it won’t recognize variables from inside string (quotes), which is very bad for every shell script writer. Double quotes are completely essential in shell scripts. They prevent so called globbing and word splitting, read ShellCheck Wiki article SC2086 for more information about this topic.


Always pipe your scripts to ShellCheck

We are here to answer your question about Using a bash array in an awk and also quoted variable: conflicting syntax issue - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji