Python: How to determine if a column contains any row with at least a value from multiple lists?

I am quite new to Python so please bear with me.

I have a dataframe where each column references a different id (firm) and I also have two lists of strings (words) contained in list1 and list2. How do I return a dummy (1 or 0) for if ANY row in a certain column contains at least one word from List1 AND at one word from List 2? I’m not sure how to get the dummies for each column to be appended to a list so I can concat this to df1. The code below returns a syntax error.


for column in df1:
    if df1.iloc[:,0].str.contains[list1,list2]:
        return '1'
    else '0'


list1 = ['apple','orange','pear']
list2 = ['big','small','round']


      col1          col2            col3
0 big apple    round kiwi        large pear
1 round pear   large orange      large kiwi

Since column 0 contains strings from at least both lists in the same row (round pear) this returns a 1, but not column 1 because the words (round and orange) are not in the same row and not column 2 because it doesn’t contain anything from list2 (despite containing something from list1). The condition just applies to ANY row within that specific column (not every row) but the two words from the corresponding two lists must appear in the same row.

Desired output:


I can’t do this manually because I have hundreds of rows and columns as well as hundreds of words in the lists. Thanks.

Code for df1:

data = {'col1': ['big apple','round pear'], 'col2': ['round kiwi','large orange'], 'col3':['large pear','large kiwi']}


You can use str.findall to search words into each columns. To do that, you have to convert your list into regex patterns:

Setup minimal reproducible example:

df = pd.DataFrame({'col1': ['big apple', 'round pear'],
                   'col2': ['round orange', 'large kiwi'],
                   'col3': ['large pear', 'large kiwi']})

list1 = ['apple', 'orange', 'pear']
list2 = ['big', 'small', 'round']


pat1 = fr"b({'|'.join(list1)})b"
pat2 = fr"b({'|'.join(list2)})b"

dummy = df.apply(lambda x: x.str.findall(pat1, re.IGNORECASE).astype(bool) 
                         & x.str.findall(pat2, re.IGNORECASE).astype(bool)) 

str.findall create a list of found words. If you cast to boolean, you have True for a non-empty list (match at least one word) and False for an empty list. So if you have two non empty lists, your conditions are filled.

>>> dummy
[1, 0, 0]

# b ->
>>> print(pat1)

>>> print(pat2)