Python – Generate random list of 2-dimesional float values

I want to generate a very long list of random two dimensional coordinates (floats) between 0.0 and 1.0.

Do you know a faster code than this (on my computer it takes about 4.1 sec for 10**7 coordinates)?:

coordinates = (np.random.randint(0, 10, (10**7, 2)) / 10.).tolist()

For this number I can wait, but what should I do when the number of coordinates is 10**9?

Timing for 10**7:

import numpy as np
import timeit


def createCoordinates(num_coord):
    return (np.random.randint(0, 10, (num_coord, 2))/10.).tolist()


def checkElapsedTime(num_runs):
    t_elapsed = np.empty(num_runs, dtype=np.float)
    for i in range(num_runs):
        t_start = timeit.default_timer()
        coordinates_2d = createCoordinates(10**7)
        t_elapsed[i] = timeit.default_timer() - t_start
        print('run: %2d, time_elapsed = %4.3f sec' % (i, t_elapsed[i]))

    print('(mean u00B1 standard deviation): elapsed time = %4.3f sec u00B1 %5.4f sec' % 
            (np.mean(t_elapsed), np.std(t_elapsed)))
     
checkElapsedTime(10)

run:  0, time_elapsed = 4.017 sec
run:  1, time_elapsed = 4.195 sec
run:  2, time_elapsed = 3.392 sec
run:  3, time_elapsed = 3.944 sec
run:  4, time_elapsed = 3.912 sec
run:  5, time_elapsed = 3.900 sec
run:  6, time_elapsed = 3.874 sec
run:  7, time_elapsed = 4.801 sec
run:  8, time_elapsed = 3.560 sec
run:  9, time_elapsed = 3.356 sec
(mean ± standard deviation): elapsed time = 3.895 sec ± 0.3979 sec

Answer

The major inefficiency in your code is calling tolist. On my machine, removing this call reduces runtime from 2.3 sec on an array of shape (10**7, 2) to 386ms. This is a 6x improvement. To understand why, imagine what happens when you call tolist: you are taking a contiguous block of memory, and individually allocating an entire float object for each element. This takes time, and probably at least triples your memory consumption.

Aside from being appendable, lists offer the same sequence interface as numpy arrays. In fact, lists are limited in how you can index them. You can only get one element at a time from a list, while numpy lets you do that as well as construct slices cheaply without copying data.

def createCoordinates(num_coord):
    return np.random.randint(0, 10, (num_coord, 2)) / 10.0

Instead of dividing integers by 10, you can generate the numbers you want directly using np.random.choice, but this is a bit slower than the division (411ms on my machine):

z = np.linspace(0.0, 1.0, 10, endpoint=False)  # Pre-compute this outside the function
def createCoordinates(num_coord):
    return np.random.choice(z, (num_coord, 2))

You can really speed things up by using the new generator API. In this case, using Generator.integers and Generator.choice produces identical timings on my machine, ~224ms for an array of shape (10**7, 2):

rng = np.random.default_rng()

def createCoordinates(num_coord):
    return rng.integers(1, 10, (num_coord, 2)) / 10.0
rng = np.random.default_rng()
z = np.linspace(0, 1.0, 10, endpoint=False)

def createCoordinates(num_coord):
    return rng.choice(z, (num_coord, 2))

All things considered, you can achieve a speedup of approximately 10x by using the API correctly.