How to find a list of values which will be returned by regex and around specific words

Example:

We have array with a list of values for AFTER and BEFORE

after = ["start","beginning "]
before = ["end","finish"]

And we have a string: "xxxx yyyy yyy xxx start value 123.5 yxyxyxyxy beginning valueTwo 156.56 yxyxyxy xyxyxy end yxyxyxy valueThree 6678.56 yxyxyxyxy xyxyx finish"

I’m having regex: (?<=b value )(d+[.,]d+), which will search for the value 123.5 etc., but I’m not sure what is the correct way to search it before and after the specific words in a specific order.

E.g.

  • if after = "start" and before = "end" it should return 123.5 and 156.56

  • if after = "start" and before = ["end","finish"] it should return in order, first 123.5 and 156.56 and then 6678.56

  • if after = "start" and before = ["ABC","finish"] it should return 123.5, 156.56 and 6678.56. So it should search for “ABC” OR “finish”

  • if after = "start" and before = ["end",""] it will return in order 123.5, 156.56 and then 6678.56, because first, he should find values before end and then as one of the options == "" so regex should search without any end points (by using full text).

Answer

I ended up with a much simpler solution:

import re

def long_zip(filler, *lists) :
    max_len = max(map(len, lists))
    return zip(*([filler] * (max_len - len(l)) + l for l in lists))

def search(source, start_array, end_array) :
    output_list = []
    for start_word, end_word in long_zip("", start_array, end_array) :
        start_regex = re.escape(start_word)
        end_regex = re.escape(end_word) if end_word != "" else "$"
        for source_part in re.findall(start_regex + ".*?" + end_regex, source) :
            output_list += re.findall(r"valuew* (d+[.,]d+)", source_part)
    return output_list