Python Tokenize Contractions using regex

I tired to follow this question to create a regex expression that separates contractions from the word.

Here is my attempt:

 line = re.sub( r's|(n't)|'m|('ll)|('ve)|('s)|('re)|('d)', r" 1",line) #tokenize contractions

However, only the first match is tokenized. For example: should've can't mustn't we'll changes to should ca n't must n't we


1 refers to the first capturing group!

You could put all the options in the same capturing group:


See a demo here.

For deepening the topic, I suggest you to read Parentheses for Grouping and Capturing.