Python Tokenize Contractions using regex

I tired to follow this question to create a regex expression that separates contractions from the word.

Here is my attempt:

 line = re.sub( r's|(n't)|'m|('ll)|('ve)|('s)|('re)|('d)', r" 1",line) #tokenize contractions

However, only the first match is tokenized. For example: should've can't mustn't we'll changes to should ca n't must n't we

Answer

1 refers to the first capturing group!

You could put all the options in the same capturing group:

(n't|'m|'ll|'ve|'s|'re|'d)

See a demo here.

For deepening the topic, I suggest you to read Parentheses for Grouping and Capturing.