Change last word after space in a dafaframe column

I am working on a data frame that contains computer names and I am trying to anonymize the computer names. Here is an example of the dataframe, I am working with

df = pd.DataFrame({'id': [1, 2, 3, 4, 5], 'computer_name': [u'LENOVO 09 X32H0GB', u'LENOVO vmhsbpmh613.xyz.biz', u'Dell Inc. PowerEdge R910 XKF2S75', u'HP  ppesfesxb203.corp.123.com', 'IBM SoftLayer 13 L89P4567']})

Here is what it is required to anonymize it.

  1. Pick the first set of strings from the RIGHT after the first SPACE from the RIGHT .. eg : for “LENOVO vmhsbpmh613.xyz.biz” it would be “vmhsbpmh613.xyz.biz”

  2. After getting the first set of strings from the RIGHT eg “vmhsbpmh613.xyz.biz”, remove all characters from the first Dot (.) , which would give “vmhsbpmh613″ and if there are no Dot(.) then retain only the last set of string , Please note it is important to remove only the strings after dot (.) from first set of strings from the RIGHT, otherwise like in this example ” Dell Inc. PowerEdge R910 XKF2S75 ” it would result in removing everything after Dot ” Dell Inc. “

  3. Lastly replace the first 3 characters with xxx , like xxxsbpmh613

Here is how the output should look like

df = pd.DataFrame({'id': [1, 2, 3, 4, 5], 'computer_name': [u'LENOVO 09 xxxH0GB', u'LENOVO xxxsbpmh613', u'Dell Inc. PowerEdge R910 xxx2S75', u'HP  xxxsfesxb203', 'IBM SoftLayer 13 xxxP4567']})

I hope, I was able to articulate the requirement clearly, thanks.

Answer

Series.str.replace

df['computer_name'].str.replace(r'S{3}(S+?)(?:.S+|$)', r'xxx1')

0                   LENOVO 09 xxxH0GB
1                  LENOVO xxxsbpmh613
2    Dell Inc. PowerEdge R910 xxx2S75
3                    HP  xxxsfesxb203
4           IBM SoftLayer 13 xxxP4567
Name: computer_name, dtype: object

Regex details

  • S{3} : Matches any non-whitespace character extactly 3 times.
  • (S+?) : Capturing group matches any non-whitespace character between 1 and unlimited times but as few times as possible (lazy match)
  • (?: : Begining of non-capturing group
  • . : Matches . character
  • S+ : Mathes any non-whitespace character
  • $ : Asserts position at the end of line
  • ) : Ending of non capturing group

See the regex demo