Replace N digit numbers in a sentence with specific strings for different values of N

I have a bunch of strings in a pandas dataframe that contain numbers in them. I could the riun the below code and replace them all

df.feature_col = df.feature_col.str.replace('d+', ' NUM ')

But what I need to do is replace any 10 digit number with a string like masked_id, any 16 digit numbers with account_number, or any three-digit numbers with yet another string, and so on.

How do I go about doing this?

PS: since my data size is less, a less optimal way is also good enough for me.


Another way is replace with option regex=True with a dictionary. You can also use somewhat more relaxed match patterns (in order) than Tim’s:

# test data
df = pd.DataFrame({'feature_col':['this has 1234567', 
                                  'this has 1234', 
                                  'this has 123',
                                  'this has none']})

# pattern in decreasing length order
# these of course would replace '12345' with 'ID45' :-)
df['feature_col'] = df.feature_col.replace({'d{7}': 'ID7',
                                            'd{4}': 'ID4',   
                                            'd{3}': 'ID3'}, 


0   this has ID7
1   this has ID4
2   this has ID3
3  this has none

Leave a Reply

Your email address will not be published. Required fields are marked *