Regular expression for unicode in java Dash version

It is possible to improve the performance of the following through a regular expression, the code is functional, but I want to know if there is any way to select the possible dash that exist in the unicode to standardize my dash

Words:

48553−FS002
48553-FS002
48553 FS002
48553-FS002-ESD12

Java

String reference = "48553−FS002";
String separador = reference.replaceFirst ( "\w+(\W)?\w+", "$1" );
if(!separator.equals ( " " )) {
   reference = reference.replaceAll ( separator, "-" );
}

Or you could search for the unicode code, I was reading the following: dash, but i haven’t managed to make it work Java Regex Unicode

Answer

If you need to match any non-word but space, you may use

reference = reference.replaceAll("[^\w ]", "-");

Or, with character class subtraction:

reference = reference.replaceAll("[\W&&[^ ]]", "-");

You can use the following pattern to match your hyphen or dash like patterns:

[p{Pd}u00ADu2212]

Here,

  • p{Pd} – matches any Punctuation, Dash symbols
  • u00AD – matches a soft hyphen
  • u2212 – matches a minus symbol.