The question is published on by Tutorial Guruji team.
Start by disclaiming that I am horrible with Regular expressions. I want to find every instance of a Social security number in a string and mask all but the dashes (-) and the last 4 of the SSN.
Example
String someStrWithSSN = "This is an SSN,123-31-4321, and here is another 987-65-8765"; Pattern formattedPattern = Pattern.compile("^\d{9}|^\d{3}-\d{2}-\d{4}$"); Matcher formattedMatcher = formattedPattern.matcher(someStrWithSSN); while (formattedMatcher.find()) { // Here is my first issue. not finding the pattern } // my next issue is that I need to my String should look like this // "This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765"
Expected results are to find each SSN and replace. The code above should produce the string, “”This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765”
Answer
You can simplify this, by doing something like the following:
String initial = "This is an SSN,123-31-4321, and here is another 987-65-8765"; String processed = initial.replaceAll("\d{3}\-\d{2}(?=\-\d{4})","XXX-XX"); System.out.println(initial); System.out.println(processed);
Output:
This is an SSN,123-31-4321, and here is another 987-65-8765
This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765
The regex d{3}-d{2}(?=-d{4})
captures three digits followed by two digits, separated by a dash (and then followed by a dash and 4 digits, non-capturing). Using replaceAll
with this regex will then create the desired masking effect.
Edit:
If you also want 9 consecutive digits to be targeted by this replacement, you can do the following:
String initial = "This is an SSN,123-31-4321, and here is another 987658765"; String processed = initial.replaceAll("\d{3}\-\d{2}(?=\-\d{4})","XXX-XX") .replaceAll("\d{5}(?=\d{4})","XXXXX"); System.out.println(initial); System.out.println(processed);
Output:
This is an SSN,123-31-4321, and here is another 987658765
This is an SSN,XXX-XX-4321, and here is another XXXXX8765
The regex d{5}(?=d{4})
captures five digits (followed by 4 digits, non-capturing). Using a second call of replaceAll
will target these sequences with the appropriate replacement.
Edit: Here’s a more robust version of the previous regex, and a longer demonstration of how the new regex works:
String initial = "123-45-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, 123-31-4321, and here is another 987658765. These have 10+ digits, so they don't match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn't match. -123-31-4321 is preceded by a dash, so it doesn't match as well. :123-31-4321 is preceded by a non-colon/digit, so it does match. Here's a 4-2-4 non-SSN that would've tricked the initial regex: 1234-56-7890. Here's two SSNs in parentheses: (777777777) (777-77-7777), and here's four invalid SSNs in parentheses: (7777777778) (777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: 998-76-4321"; String processed = initial.replaceAll("(?<=^|[^-\d])\d{3}\-\d{2}(?=\-\d{4}([^-\d]|$))","XXX-XX") .replaceAll("(?<=^|[^-\d])\d{5}(?=\d{4}($|\D))","XXXXX"); System.out.println(initial); System.out.println(processed);
Output:
123-45-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, 123-31-4321, and here is another 987658765. These have 10+ digits, so they don’t match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn’t match. -123-31-4321 is preceded by a dash, so it doesn’t match as well. :123-31-4321 is preceded by a non-colon/digit, so it does match. Here’s a 4-2-4 non-SSN that would’ve tricked the initial regex: 1234-56-7890. Here’s two SSNs in parentheses: (777777777) (777-77-7777), and here’s four invalid SSNs in parentheses: (7777777778)(777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: 998-76-4321
XXX-XX-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, XXX-XX-4321, and here is another XXXXX8765. These have 10+ digits, so they don’t match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn’t match. -123-31-4321 is preceded by a dash, so it doesn’t match as well. :XXX-XX-4321 is preceded by a non-colon/digit, so it does match. Here’s a 4-2-4 non-SSN that would’ve tricked the initial regex: 1234-56-7890. Here’s two SSNs in parentheses: (XXXXX7777) (XXX-XX-7777), and here’s four invalid SSNs in parentheses: (7777777778)(777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: XXX-XX-4321