Mask all SSN with only partial Mask from a file with multiple SSNs Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Mask all SSN with only partial Mask from a file with multiple SSNs without wasting too much if your time.

The question is published on by Tutorial Guruji team.

Start by disclaiming that I am horrible with Regular expressions. I want to find every instance of a Social security number in a string and mask all but the dashes (-) and the last 4 of the SSN.

Example

String someStrWithSSN = "This is an SSN,123-31-4321, and here is another 987-65-8765";
Pattern formattedPattern = Pattern.compile("^\d{9}|^\d{3}-\d{2}-\d{4}$");
Matcher formattedMatcher = formattedPattern.matcher(someStrWithSSN);

while (formattedMatcher.find()) {
    // Here is my first issue.  not finding the pattern
}

// my next issue is that I need to my String should look like this
//     "This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765"

Expected results are to find each SSN and replace. The code above should produce the string, “”This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765”

Answer

You can simplify this, by doing something like the following:

String initial = "This is an SSN,123-31-4321, and here is another 987-65-8765";
String processed = initial.replaceAll("\d{3}\-\d{2}(?=\-\d{4})","XXX-XX");
System.out.println(initial);
System.out.println(processed);

Output:

This is an SSN,123-31-4321, and here is another 987-65-8765
This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765

The regex d{3}-d{2}(?=-d{4}) captures three digits followed by two digits, separated by a dash (and then followed by a dash and 4 digits, non-capturing). Using replaceAll with this regex will then create the desired masking effect.

Edit:

If you also want 9 consecutive digits to be targeted by this replacement, you can do the following:

String initial = "This is an SSN,123-31-4321, and here is another 987658765";
String processed = initial.replaceAll("\d{3}\-\d{2}(?=\-\d{4})","XXX-XX")
                       .replaceAll("\d{5}(?=\d{4})","XXXXX");
System.out.println(initial);
System.out.println(processed);

Output:

This is an SSN,123-31-4321, and here is another 987658765
This is an SSN,XXX-XX-4321, and here is another XXXXX8765

The regex d{5}(?=d{4}) captures five digits (followed by 4 digits, non-capturing). Using a second call of replaceAll will target these sequences with the appropriate replacement.

Edit: Here’s a more robust version of the previous regex, and a longer demonstration of how the new regex works:

String initial = "123-45-6789 is a SSN that starts at the beginning of the string,
    and still matches. This is an SSN, 123-31-4321, and here is another 987658765. These
    have 10+ digits, so they don't match: 123-31-43214, and 98765876545.
    This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn't match.
    -123-31-4321 is preceded by a dash, so it doesn't match as well. :123-31-4321 is 
    preceded by a non-colon/digit, so it does match. Here's a 4-2-4 non-SSN that would've
    tricked the initial regex: 1234-56-7890. Here's two SSNs in parentheses: (777777777) 
    (777-77-7777), and here's four invalid SSNs in parentheses: (7777777778) (777-77-77778)
    (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN:
    998-76-4321";
String processed = initial.replaceAll("(?<=^|[^-\d])\d{3}\-\d{2}(?=\-\d{4}([^-\d]|$))","XXX-XX")
                       .replaceAll("(?<=^|[^-\d])\d{5}(?=\d{4}($|\D))","XXXXX");
System.out.println(initial);
System.out.println(processed);

Output:

123-45-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, 123-31-4321, and here is another 987658765. These have 10+ digits, so they don’t match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn’t match. -123-31-4321 is preceded by a dash, so it doesn’t match as well. :123-31-4321 is preceded by a non-colon/digit, so it does match. Here’s a 4-2-4 non-SSN that would’ve tricked the initial regex: 1234-56-7890. Here’s two SSNs in parentheses: (777777777) (777-77-7777), and here’s four invalid SSNs in parentheses: (7777777778)(777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: 998-76-4321

XXX-XX-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, XXX-XX-4321, and here is another XXXXX8765. These have 10+ digits, so they don’t match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn’t match. -123-31-4321 is preceded by a dash, so it doesn’t match as well. :XXX-XX-4321 is preceded by a non-colon/digit, so it does match. Here’s a 4-2-4 non-SSN that would’ve tricked the initial regex: 1234-56-7890. Here’s two SSNs in parentheses: (XXXXX7777) (XXX-XX-7777), and here’s four invalid SSNs in parentheses: (7777777778)(777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: XXX-XX-4321

We are here to answer your question about Mask all SSN with only partial Mask from a file with multiple SSNs - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji