Regex to keep all letters in all alphabets along with digits and underscore (problem on hindi letters)

I found out a regex pattern to remove all non alphabet letters: p{L}

I thus did a regex to remove all non alphabet, non digit and non underscore pattern : /[^p{L}d_]/gimu

Unfortunately, it does not work with a hindi character like #फ्रांस which gives फरस

See for yourself here https://regex101.com/r/dnXDK0/1

And please help me 🙂

Answer

You forgot about diacritics. You need to add p{M} or p{Mn} into the negated character class:

/[^p{L}p{M}d_]/gu

See the regex demo.

Note you do not need the i and m flags here. m redefines anchor behavior, but your regex contains no ^ nor $. i makes caseful letters match in a case insensitive way, but p{L} matches all letters, upper- and lowercase ones.