Checking all rank is true without using mpi4py gather and scatter

I am trying to communicate between processes so that every processes are notified when all other processes are ready. The code snippet below does that. Is there a more elegant way to do this?

def get_all_ready_status(ready_batch):
    all_ready= all(ready_batch)
    return [all_ready for _ in ready_batch]

ready_batch= comm.gather(ready_agent, root=0)
if rank == 0:
    all_ready_batch = get_all_ready_status(ready_batch)
all_ready_flag = comm.scatter(all_ready_batch , root=0)                

Answer

If all the processes need to be aware which other processes are ready then you can use the comm.Allgather routine:

from mpi4py import MPI
import numpy


comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

sendBuffer = numpy.ones(1, dtype=bool)
recvBuffer = numpy.zeros(size, dtype=bool)

print("Before Allgather => Process %s | sendBuffer %s | recvBuffer %s" % (rank, sendBuffer, recvBuffer))
comm.Allgather([sendBuffer,  MPI.BOOL],[recvBuffer, MPI.BOOL])
print("After Allgather  => Process %s | sendBuffer %s | recvBuffer %s" % (rank, sendBuffer, recvBuffer))

Output:

Before Allgather => Process 0 | sendBuffer [ True] | recvBuffer [False False]
Before Allgather => Process 1 | sendBuffer [ True] | recvBuffer [False False]
After Allgather  => Process 0 | sendBuffer [ True] | recvBuffer [ True  True]
After Allgather  => Process 1 | sendBuffer [ True] | recvBuffer [ True  True]

As pointed out in the comments by @Gilles Gouaillardet:

if all processes only have to know if all processes are ready, then MPI_Allreduce() is an even better fit.

The idea is that in theory Allreduce should be faster then Allgather because the former can use a tree communication pattern and because it will require to allocate and communicate less memory. More information can be found here.

In your case, you use MPI.LAND (i.e., logical and) as the Allreduce operation operator.

An example:

from mpi4py import MPI
import numpy


comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

sendBuffer =  numpy.ones(1, dtype=bool) if rank % 2 ==  0 else numpy.zeros(1, dtype=bool)
recvBuffer = numpy.zeros(1, dtype=bool)

print("Before Allreduce => Process %s | sendBuffer %s | recvBuffer %s" % (rank, sendBuffer, recvBuffer))
comm.Allreduce([sendBuffer,  MPI.BOOL],[recvBuffer, MPI.BOOL], MPI.LAND)
print("After Allreduce  => Process %s | sendBuffer %s | recvBuffer %s" % (rank, sendBuffer, recvBuffer))

comm.Barrier()
if rank == 0:
   print("Second RUN")
comm.Barrier()

sendBuffer =  numpy.ones(1, dtype=bool)
recvBuffer = numpy.zeros(1, dtype=bool)

print("Before Allreduce => Process %s | sendBuffer %s | recvBuffer %s" % (rank, sendBuffer, recvBuffer))
comm.Allreduce([sendBuffer,  MPI.BOOL],[recvBuffer, MPI.BOOL], MPI.LAND)
print("After Allreduce  => Process %s | sendBuffer %s | recvBuffer %s" % (rank, sendBuffer, recvBuffer))

The Output:

Before Allreduce => Process 1 | sendBuffer [False] | recvBuffer [False]
Before Allreduce => Process 0 | sendBuffer [ True] | recvBuffer [False]
After Allreduce  => Process 1 | sendBuffer [False] | recvBuffer [False]
After Allreduce  => Process 0 | sendBuffer [ True] | recvBuffer [False]
Second RUN
Before Allreduce => Process 0 | sendBuffer [ True] | recvBuffer [False]
Before Allreduce => Process 1 | sendBuffer [ True] | recvBuffer [False]
After Allreduce  => Process 0 | sendBuffer [ True] | recvBuffer [ True]
After Allreduce  => Process 1 | sendBuffer [ True] | recvBuffer [ True]

In the first part of the output (i.e., before “Second RUN”), the result is FALSE because the processes with even rank where not ready (i.e., False) and the processes with odd rank where ready. Hence, False & True => False. In the second part, the result is True because all processes were ready.