Split string based on regex pattern

I have a message which I am trying to split.

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r'[a-zA-Z]{3} (0[1-9]|[1-2][0-9]|3[0-1]), ([0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC', message)

print(split_message)

Expected Output:

["This is update 1", "This is update 2", "This is update 3"]

Actual Output:

['', '10', '17', "This is update 1", '10', '15',  "This is update 2", '10', '15', "This is update 3"]

Not sure what I am missing.

Answer

You are using “capturing groups”, this is why their content is also part of the result array. You’ll want to use non capturing groups (beginning with ?:):

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r"[a-zA-Z]{3} (?:0[1-9]|[1-2][0-9]|3[0-1]), (?:[0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC", message)

print(split_message)

You will however always get an empty entry first, because an empty string is in front of your first split pattern:

['', 'This is update 1.', 'This is update 2.', 'This is update 3.']

As statet in the docs:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.