Getting a file with wget when the filename may change slightly

I have a program that takes data from five government sources and merges them into one large database for my company. I use wget to retrieve the files. However I have discovered that one of the sources changes the name every time it is updated.

For example, the last time I got the file it was called myfile150727.flatfile. Today when I tried to run my program I got exit status 8 no such file. When I manually got into the ftp I found that the file is now called myfile150914.flatfile. So obviously the filename is changing based upon the date it was last updated.

Can I modify my script to take this fact into account and still automatically download the file?

Answer

Yes, but the details depend on how the file’s name changes. If it is always today’s date, just tell your script to get that:

filename=myfile"$(date +%y%m%d)".flatfile
wget ftp://example.com/"$file"

Or, if it is not updated daily and there is only one file called myfileWHATEVER.flatfile, get that:

wget "ftp://example.com/myfile*.flatfile"

If you can have many files with similar names, you could download all of them and then keep only the newest:

wget -N "ftp://example.com/myfile*.flatfile"
## Find the newest file
for file in myfile*.flatfile; do
    [[ "$file" -nt "$newest" ]] && newest="$file";
done
## Delete the rest
for file in myfile*.flatfile; do
    [[ "$file" != "$newest" ]] && rm "$file"
done

Alternatively, you can extract the date from the file name instead:

wget -N "ftp://example.com/myfile*.flatfile"
for file in myfile*.flatfile; do
    fdate=$(basename "${file//myfile}" .flatfile)
    [[ "$fdate" -gt $(basename "${nfile//myfile}" .flatfile) ]] && nfile="$file"
done
for file in myfile*.flatfile; do
    [[ "$file" = "$nfile" ]] || rm "$file"
done

Note that the above will keep multiple files if more than one have the same modification date.

Leave a Reply

Your email address will not be published. Required fields are marked *