Filtering a dataframe that includes list in cell

I can usually problem solve without having to ask here but this time I’m stumped. I have a dataframe set up like this:

      Name     Location     Numbers
0     Tim      USA          [1, 3, 6, 65, 87]
1     Ryan     USA          [1, 9, 5, 65, 43]
2     Kyle     Canada       [1, 8, 6, 12, 87]
3     Sarah    Mexico       [1, 4, 6, 21, 65]

What I am looking to accomplish is filter this dataframe to only return rows in which the list within ‘Numbers’ contains ’65’. If anyone has ideas on how to accomplish this I would greatly appreciate it.

Answer

With a column of lists, you can iterate over the rows and check if the list contains the number.

m = df['Numbers'].apply(lambda x: 65 in x)
#0     True
#1     True
#2    False
#3     True

df[m]
#    Name Location            Numbers
#0    Tim      USA  [1, 3, 6, 65, 87]
#1   Ryan      USA  [1, 9, 5, 65, 43]
#3  Sarah   Mexico  [1, 4, 6, 21, 65]

However, complex objects, like lists, are going to ruin a lot of the performance pandas has to offer. If you have a lot of data or are going to do a lot of manipulations the above is going to get really slow. It’s usually better to pay the price and reshape your data (slow once) and then all your other operations can be much caster. In this case we can go to a wide format.

df = pd.concat([df, pd.DataFrame(df.pop('Numbers').tolist()).add_prefix('Number_')], axis=1)
#    Name Location  Number_0  Number_1  Number_2  Number_3  Number_4
#0    Tim      USA         1         3         6        65        87
#1   Ryan      USA         1         9         5        65        43
#2   Kyle   Canada         1         8         6        12        87
#3  Sarah   Mexico         1         4         6        21        65

And now your mask is a vectorized equality + any check.

m = df.filter(like='Number').eq(65).any(1)