Which columns are binary in a Pandas DataFrame?

I have a pandas dataframe with a large number of columns and I need to find which columns are binary (with values 0 or 1 only) without looking at the data. Which function should be used?

Answer

To my knowledge, there is no direct function to test for this. Rather, you need to build something based on how the data was encoded (e.g. 1/0, T/F, True/False, etc.). In addition, if your column has a missing value, the entire column will be encoded as a float instead of an int.

In the example below, I test whether all unique non null values are either ‘1’ or ‘0’. It returns a list of all such columns.

df = pd.DataFrame({'bool': [1, 0, 1, None], 
                   'floats': [1.2, 3.1, 4.4, 5.5], 
                   'ints': [1, 2, 3, 4], 
                   'str': ['a', 'b', 'c', 'd']})

bool_cols = [col for col in df 
             if df[[col]].dropna().unique().isin([0, 1]).all().values]

# 2019-09-10 EDIT (per Hardik Gupta)
bool_cols = [col for col in df 
             if np.isin(df[col].dropna().unique(), [0, 1]).all()]

>>> bool_cols
['bool']

>>> df[bool_cols]
   bool
0     1
1     0
2     1
3   NaN

Leave a Reply

Your email address will not be published. Required fields are marked *