why numpy max function(np.max) return wrong output?

I have pandas DataFrame and I turn it to numpy ndarray.I use max function for one column in my DataFrame like this:

print('column: ',df[:,3])
print('max: ',np.max(df[:,3]))

And the output was:

column: [0.6559999999999999 0.48200000000000004 0.9990000000000001 ..., 1.64 nan 0.07]
max: 0.07

But as you can see for example first value is greater than 0.07!! What is the problem?

Answer

There are two problems here



  1. It looks like column you are trying to find maximum for has the data type object. It’s not recommended if you are sure that your column should contain numerical data since it may cause unpredictable behaviour not only in this particular case. Please check data types for your dataframe(you can do this by typing df.dtypes) and change it so that it corresponds to data you expect(for this case df[column_name].astype(np.float64)). This is also the reason for np.nanmax not working properly.

  2. You don’t want to use np.max on arrays, containing nans.



Solution



  1. If you are sure about having object data type of column:

    1.1. You can use the max method of Series, it should cast data to float automatically.

    df.iloc[3].max()

    1.2. You can cast data to propper type only for nanmax function.

    np.nanmax(df.values[:,3].astype(np.float64)

    1.3 You can drop all nan’s from dataframe and find max[not recommended]:

    np.max(test_data[column_name].dropna().values)
    

  1. If type of your data is float64 and it shouldn’t be object data type [recommended]:

    df[column_name] = df[column_name].astype(np.float64)
    
    np.nanmax(df.values[:,3])
    


Code to illustrate problem



#python
import pandas as pd
import numpy as np 

test_data = pd.DataFrame({
                   'objects_column': np.array([0.7,0.5,1.0,1.64,np.nan,0.07]).astype(object),
                   'floats_column': np.array([0.7,0.5,1.0,1.64,np.nan,0.07]).astype(np.float64)})

print("********Using np.max function********")
print("Max of objects array:", np.max(test_data['objects_column'].values))
print("Max of floats array:", np.max(test_data['floats_column'].values))

print("n********Using max method of series function********")
print("Max of objects array:", test_data["objects_column"].max()) 
print("Max of floats array:", test_data["objects_column"].max()) 

Returns:

********Using np.max function********
Max of objects array: 0.07
Max of floats array: nan

********Using max method of series function********
Max of objects array: 1.64
Max of floats array: 1.64