How to handle tasks with varying completion times

I’m watching a video(a) on YouTube about asyncio and, at one point, code like the following is presented for efficiently handling multiple HTTP requests:

# Need an event loop for doing this.

loop = asyncio.get_event_loop()

# Task creation section.

tasks = []
for n in range(1, 50):
    tasks.append(loop.create_task(get_html(f"https://example.com/things?id={n}")))

# Task processing section.

for task in tasks:
    html = await task
    thing = get_thing_from_html(html)
    print(f"Thing found: {thing}", flush=True)

I realise that this is efficient in the sense that everything runs concurrently but what concerns me is a case like:

  • the first task taking a full minute; but
  • all the others finishing in under three seconds.

Because the task processing section awaits completion of the tasks in the order in which they entered the list, it appears to me that none will be reported as complete until the first one completes.

At that point, the others that finished long ago will also be reported. Is my understanding correct?

If so, what is the normal way to handle that scenario, so that you’re getting completion notification for each task the instant that task finishes?


(a) From Michael Kennedy of “Talk Python To Me” podcast fame. The video is Demystifying Python’s Async and Await Keywords if you’re interested. I have no affiliation with the site other than enjoying the podcast, so heartily recommend it.

Answer

If you just need to do something after each task, you can create another async function that does it, and run those in parallel:

async def wrapped_get_html(url):
    html = await get_html(url)
    thing = get_thing_from_html(html)
    print(f"Thing found: {thing}")

async def main():
    # shorthand for creating tasks and awaiting them all
    await asyncio.gather(*
        [wrapped_get_html(f"https://example.com/things?id={n}")
         for n in range(50)])
 
asyncio.run(main())

If for some reason you need your main loop to be notified, you can do that with as_completed:

async def main():
    for next_done in asyncio.as_completed([
            get_html(f"https://example.com/things?id={n}")
            for n in range(50)]):
        html = await next_done
        thing = get_thing_from_html(html)
        print(f"Thing found: {thing}")

asyncio.run(main())