I want to generate a very long list of random two dimensional coordinates (floats) between
Do you know a faster code than this (on my computer it takes about
4.1 sec for
coordinates = (np.random.randint(0, 10, (10**7, 2)) / 10.).tolist()
For this number I can wait, but what should I do when the number of coordinates is
import numpy as np import timeit def createCoordinates(num_coord): return (np.random.randint(0, 10, (num_coord, 2))/10.).tolist() def checkElapsedTime(num_runs): t_elapsed = np.empty(num_runs, dtype=np.float) for i in range(num_runs): t_start = timeit.default_timer() coordinates_2d = createCoordinates(10**7) t_elapsed[i] = timeit.default_timer() - t_start print('run: %2d, time_elapsed = %4.3f sec' % (i, t_elapsed[i])) print('(mean u00B1 standard deviation): elapsed time = %4.3f sec u00B1 %5.4f sec' % (np.mean(t_elapsed), np.std(t_elapsed))) checkElapsedTime(10) run: 0, time_elapsed = 4.017 sec run: 1, time_elapsed = 4.195 sec run: 2, time_elapsed = 3.392 sec run: 3, time_elapsed = 3.944 sec run: 4, time_elapsed = 3.912 sec run: 5, time_elapsed = 3.900 sec run: 6, time_elapsed = 3.874 sec run: 7, time_elapsed = 4.801 sec run: 8, time_elapsed = 3.560 sec run: 9, time_elapsed = 3.356 sec (mean ± standard deviation): elapsed time = 3.895 sec ± 0.3979 sec
The major inefficiency in your code is calling
tolist. On my machine, removing this call reduces runtime from 2.3 sec on an array of shape
(10**7, 2) to 386ms. This is a 6x improvement. To understand why, imagine what happens when you call
tolist: you are taking a contiguous block of memory, and individually allocating an entire
float object for each element. This takes time, and probably at least triples your memory consumption.
Aside from being appendable, lists offer the same sequence interface as numpy arrays. In fact, lists are limited in how you can index them. You can only get one element at a time from a list, while numpy lets you do that as well as construct slices cheaply without copying data.
def createCoordinates(num_coord): return np.random.randint(0, 10, (num_coord, 2)) / 10.0
Instead of dividing integers by 10, you can generate the numbers you want directly using
np.random.choice, but this is a bit slower than the division (411ms on my machine):
z = np.linspace(0.0, 1.0, 10, endpoint=False) # Pre-compute this outside the function def createCoordinates(num_coord): return np.random.choice(z, (num_coord, 2))
You can really speed things up by using the new generator API. In this case, using
Generator.choice produces identical timings on my machine, ~224ms for an array of shape
rng = np.random.default_rng() def createCoordinates(num_coord): return rng.integers(1, 10, (num_coord, 2)) / 10.0
rng = np.random.default_rng() z = np.linspace(0, 1.0, 10, endpoint=False) def createCoordinates(num_coord): return rng.choice(z, (num_coord, 2))
All things considered, you can achieve a speedup of approximately 10x by using the API correctly.