Python str.slice in apply function get AttributeError

I have a dataset like this:

col1      col2
12345        N
450540       Y
356487       Y
000123       Y
111564       Y
df['col3'] = df[['col1', 'col2']].apply(lambda x: "N" if (x['col1'].str[:3]) in ['000','111'] else 
                                                x['col2'], axis = 1)

However, I got this error: AttributeError: ("'str' object has no attribute 'str'", 'occurred at index 0')

The data type for col1 is object:

df['col1'].dtypes
> dtype('O')

Any suggestion on this?

Answer

The lambda in your apply takes each row individually. So x['col1'] is a str, which does not have the attribute str. So just modify the lambda so that it only slices:

df['col3'] = df[['col1', 'col2']].apply(lambda x: "N" if x['col1'][:3] in ('000', '111') else x['col2'], axis=1)

df
     col1 col2 col3
0   12345    N    N
1  450540    Y    Y
2  356487    Y    Y
3  000123    Y    N
4  111564    Y    N

Can I use str.slice?

Yes, you can use it to define a mask like this:

mask = df.col1.str.slice(stop=3).apply(lambda x:  x in ('000', '111'))

mask
0    False
1    False
2    False
3     True
4     True

# The default condition is that 'col3' takes the values in 'col2'
df['col3'] = df['col2']

# Then use `loc` to switch the values based on the mask
df['col3'].loc[(mask)] = 'N'

df
     col1 col2 col3
0   12345    N    N
1  450540    Y    Y
2  356487    Y    Y
3  000123    Y    N
4  111564    Y    N

As an alternative, it might just be better to use str.contains with a regex:

mask = df['col1'].str.contains('^(000|111)', regex=True)

mask
0    False
1    False
2    False
3     True
4     True
Name: col1, dtype: bool

Leave a Reply

Your email address will not be published. Required fields are marked *