bash script: capturing tcp traffic on a remote server sometimes works, sometimes fails. No errors

Background

I am running BusyBox in the remote server.

I have a bash script that does two things:
1. via ssh, starts a sub process to monitor tcp traffic using tcpdump command. Save results to a file – either on remote machine or local machine. Tried both.
2. starts a second sub process to generate tcp traffic.

Code Snippet:

#html_tcpdumpfile="$(ssh remotemachine.mydomain.net "mktemp")"
html_tcpdumpfile=$(mktemp)

test_steps=(
    #"{ ssh remotemachine.mydomain.net "timeout -t 20 tcpdump -nvi eth0 port 5060 > "$html_tcpdumpfile" " ; }" 
    "{ ssh remotemachine.mydomain.net "timeout -t 20 tcpdump -i eth0 port 5060 "> $html_tcpdumpfile; }"   
    "{ ssh remotemachine.mydomain.net "timeout -t 15 cat /tmp/htmlemail.txt | /etc/postfix/process_email.py "; }"
 )
pids=()
for index in ${!test_steps[@]}; do       
      (echo "${test_steps[$index]}" | bash) &
      pids[${index}]=$!
      echo "$pids[${index}] is the pid"
done

#shouldn't really need this because of the timers but... just in case...
for pid in ${pids[*]}; 
do   
  wait $pid; 
done;
# ============ ANALYZE TEST RESULTS
echo "========== html_tcpdumpfile CONTENTS ============="
cat $html_tcpdumpfile
echo "========== html_tcpdumpfile CONTENTS ============="

Problem

Sometimes, the tcpdump command doesn’t capture anything, and at other times it does. There are no error messages when it fails to capture.

What I’ve tried So far

  1. As you can see, I’ve tried to change the location of the dump file between the remote machine and the local one. That doesn’t seem to make a difference.

  2. I’ve proven that TCP traffic is ALWAYS generated… each time I run the script because I have another ssh session open and i can see the traffic being generated. It’s just that my script intermittently fails to capture it.

  3. I’ve tried to increase the timeout value on the tcp session to something huge to make sure I give it enough time. But I don’t think that’s the problem.

Any suggestions would be appreciated. Thanks.

EDIT 1

I tried to introduce a sleep in between launching each subprocess:

pids=()
for index in ${!test_steps[@]}; do       
      (echo "${test_steps[$index]}" | bash) &
      sleep 5
      pids[${index}]=$!
      echo "$pids[${index}] is the pid"
done

But that doesn’t make a difference either.

EDIT 2

I changed the tcpdump command to look like this:

test_steps=(     
    "{ ssh remotemachine.mydomain.net "timeout -t 30 tcpdump -nlc 100 -i eth0 port 5060 "> $rtf_tcpdumpfile; }" 
    "{ ssh remotemachine.mydomain.net "timeout -t 20 tail -f /var/log/messages " > $syslog; }"    
    "{ ssh remotemachine.mydomain.net "timeout -t 15 cat /tmp/htmlemail.txt | /etc/postfix/process_email.py "; }"
 )

The tcpdump still fails to capture intermittently, but … what’s interesting is that the syslog is always successfully captured. (the python script actually writes to the syslog when it’s invoked and so I can see /prove that the script is working)

Answer

First off, if you are dealing with an appliance/iOT with a limited space, I would deal with the output in the calling side, i.e. using the > after the ssh commands as in

ssh "command" > output.txt

As for tcpdump I would not kill it as a policy all the time, risking losing buffers. You might have not output maybe because of that.

I would place a limit on packets captured. I would also try not to solve DNS. As in, for capturing 100 packets:

tcpdump -nc 100 -i eth0 port 5600

When you are storing the capture file on the local system, you should only run cat locally and not remotely and locally.

Likewise, when you are running both tcpdump and cat remotely, you are launching both at the same time, and both the remote and local cat won’t have nothing to show.

Following the suggestion of @MarkPlotnick, I also added -lto tcpdump to make it line buffered. That may obviate the need for the -c option. I would use both.

So I would change that script for:

#!/bin/bash
html_tcpdumpfile=$(mktemp)

ssh remotemachine.mydomain.net "timeout -t 20 tcpdump -nlc 100 -i eth0  port 5060 " > $html_tcpdumpfile

cat $html_tcpdumpfile

rm $html_tcpdumpfile

Or we might not even need to create explicitly a temporary file:

#!/bin/bash

ssh remotemachine.mydomain.net "timeout -t 20 tcpdump -nlc 100 -i eth0  port 5060 " 
| less

Lastly, I would advise deleting all the temp files created, specially on the remote side.

PS: the OP mentioned in comments the remote system is BusyBox and as such the timeout options are different than in the coretutils package. I also edit the question for it to mention BusyBox.

Leave a Reply

Your email address will not be published. Required fields are marked *