Regex extract group inside optional group

I have strings of the form “identfier STEP=10” where the “STEP=10” part is optional. The goal is to detect both lines with or without the STEP part and to extract the numerical value of STEP in cases where it is part of the line. Now matching both cases is easy enough,

import re
pattern = ".*(STEP=[0-9]+)?"
re.match(pattern, "identifier STEP=10")
re.match(pattern, "identifier")

This detects both cases without problem. But I fail to extract the numerical value in one go. I tried the following,

import re
pattern = ".*(STEP=([0-9]+))?"
group0 = re.search(pattern, "identifier STEP=10").groups()
group1 = re.search(pattern, "identifier").groups()

And while it still does detect the lines, i only get

group0 = (None, None)
group1 = (None, None)

While i hoped to get something like

group0 = (None, "10")
group1 = (None, None)

Is regex not suited to do this in one go or am I simply using it wrong ? I am curious if there is a single regex call that returns what I want without doing a second pass after I have matched the line.

Answer

A possible solution will look like

import re
pattern = "^.*?(?:STEP=([0-9]+))?$"
group0 = re.search(pattern, "identifier STEP=10").groups()
group1 = re.search(pattern, "identifier").groups()
print(*group0)
print(*group1)

See the Python demo.

The ^.*?(?:STEP=([0-9]+))?$ regex matches

  • ^ – start of string
  • .*? – zero or more chars other than line break chars as few as possible (i.e. the regex engine skips this pattern first and tries the subsequent patterns, and only comes back to use this when the subsequent patterns fail to match)
  • (?:STEP=([0-9]+))? – an optional non-capturing group: STEP= and then Group 1 capturing one or more ASCII digits
  • $ – end of string.

The .*(STEP=[0-9]+)? regex matches like this:

  • .* – grabs the whole line, from start to end
  • (STEP=[0-9]+)? – the group is quantified with * (meaning zero or more occurrences of the quantified pattern), so the regex engine, with its index being at the end of the line now, finds a match: an empty string at the string end, and the match is returned, with Group 1 text value as empty.

To be able to resolve such issues you must understand backtracking in regex (for example, see this YT video of mine to learn more about it).