Custom regex word boundary (javascript)

I’m trying to create a custom word boundary (like b) that also takes words starting or ending with the unicode characters “ÆØÅæøå” into consideration.

Now the only thing I can come up with is this ugly thing

((?<![wÆØÅæøå])(?=[wÆØÅæøå])|(?![wÆØÅæøå])(?<=[wÆØÅæøå]))

Is there a more elegant solution to this? Or is this the only way.

Answer

You can use:

(?<!p{L}p{M}*|[p{N}_]) // leading word boundary, similar to <, [[:<:]] or m in other flavors
(?![p{L}p{N}_])         // trailing word boundary, similar to >, [[:>:]] or M

Compile the regex with the u modifier to enable Unicode category classes.

The (?<!p{L}p{M}*|[p{N}_]) is a negative lookbehind that matches a location not immediately preceded with a letter followed with zero or more diacritic marks or a digit or an underscore.

The (?![p{L}p{N}_]) is a negative lookahead that matches a location not immediately followed with a letter, digit or an underscore.