Here’s a quick example of I’m trying to do and the error I’m getting:
for symbol in itertools.product(list_a, repeat=8): list_b.append(symbol)
I’m also afterwards excluding combinations from that list like so:
for combination in list_b: valid_b = True for symbols in range(len(list_exclude)): if list_exclude[symbols] in combination: valid_b = False else: pass if valid_b: new_list.append(combination)
I’ve heard somehow chunking the process might help, not sure how that could be done here though.
I’m using multiprocessing for this as well.
When I run it I get “MemoryError”
How would you go about it?
Don’t pre-compute anything, especially not the first full list:
def symbols(lst, exclude): for symbol in map(''.join, itertools.product(lst, repeat=8)): if any(map(symbol.__contains__, exclude)): continue yield symbol
Now use the generator as you need to lazily evaluate the elements. Keep in mind that since it’s pre-filtering the data, even
list(symbols(list_a, list_exclude)) will he much cheaper than what you originally wrote.
Here is a breakdown of what happens:
itertools.productis a generator. That means that it produces an output without retaining a reference to any previous items. Each element it returns is a
tuplecontaining some combination of the input elements.
Since you want to compare strings, you need to convert the
''.join. Mapping it onto each of the
itertools.productproduces converts those elements into strings. For example:
>>> ''.join(('$', '$', '&', '&', '♀', '@', '%', '$')) '$$&&♀@%$'
symbolthus created can be done by checking if any of the items in
excludesare contained in it. You can do this with something like
[ex in symbol for ex in exclude]
... in symbolis implemented via the magic method
symbol.__contains__. You can therefore
mapthat method to every element of
Since the first element of
excludethat is contained in
symbolinvalidates it, you don’t need to check the remainder. This is called short-circuiting, and is implemented in the
anyfunction. Notice that because
mapis a generator, the remaining elements will actually not be computed once a match is found. This is different from using a list comprehension, which pre-computed all the elements.
yieldinto your function turns it into a generator function. That means that when you call
symbols(...), it returns a generator object that you can iterate over. This object does not pre-compute anything until you call
nexton it. So if you write the data to a file (for example), only the current result will be in memory at once. It may take a long time to write out a large number of results but your memory usage should not spike at all from it.