Check if all values of a column are equal in PySpark Dataframe

I have to get rid of columns that don’t add information to my dataset, i.e. columns with the same values in all the entries.

I devised two ways of doing this

  1. A way using max and min values:
for col in df.columns:
    if df.agg(F.min(col)).collect()[0][0] == df.agg(F.max(col)).collect()[0][0]:
        df = df.drop(col)
  1. Another one, using distinct and count:
for col in df.columns:
    if df.select(col).distinct().count() == 1:
        df = df.drop(col)

Is there a better, faster or more straight forward way to do this?

Answer

df = df.drop(*(col for col in df.columns if df.select(col).distinct().count() == 1))