Sorting not seeming to work [closed] Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Sorting not seeming to work [closed] without wasting too much if your time.

The question is published on by Tutorial Guruji team.

I am currently writing a webcrawl bot. It generates a list of URLs, and I need it to remove duplicates, and sort the lines alphabetically. My code looks like this:

#! /bin/bash
URL="google.com"
while [ 1 ]; do
  wget --output-document=dl.html $URL
  links=($(grep -Po '(?<=href=")[^"]*' dl.html))
  printf "%sn" ${links[@]} >> results.db

  sort results.db | uniq -u

  URL=$(shuf -n 1 results.db)
  echo $URL
done

Spefifcially the line:

sort results.db | uniq -u

Answer

POSIX says of uniq -u:

Suppress the writing of lines that are repeated in the input.

which means that any line which is repeated (even the original line) will be filtered out. What you meant was probably (done with POSIX also):

sort -u results.db

For sort -u, POSIX says

Unique: suppress all but one in each set of lines having equal keys. If used with the -c option, check that there are no lines with duplicate keys, in addition to checking that the input file is sorted.

In either case, the following line

URL=$(shuf -n 1 results.db)

probably assumes that the purpose of sort/uniq is to update results.db (it won’t). You would have to modify the script a little more for that:

sort -u results.db >results.db2 && mv results.db2 results.db

or (as suggested by @drewbenn), combine it with the previous line. However, since that appends to the file (combining the commands as shown in his answer won’t eliminate the duplicates between the latest printf and the file’s contents), a separate command sort/mv looks closer to the original script.

If you want to ensure that $URL is not empty, that’s (actually another question), and done by the [ test, e.g.,

  [ -n "$URL" ] && wget --output-document=dl.html $URL

though simply exiting from the loop would be simpler:

[ -z "$URL" ] && break
We are here to answer your question about Sorting not seeming to work [closed] - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji