I tired to follow this question to create a regex expression that separates contractions from the word.
Here is my attempt:
line = re.sub( r's|(n't)|'m|('ll)|('ve)|('s)|('re)|('d)', r" 1",line) #tokenize contractions
However, only the first match is tokenized. For example:
should've can't mustn't we'll changes to
should ca n't must n't we
1 refers to the first capturing group!
You could put all the options in the same capturing group:
See a demo here.
For deepening the topic, I suggest you to read Parentheses for Grouping and Capturing.