Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Pandas: Clean Up Data Frame by Eliminating NaN without wasting too much if your time.
The question is published on by Tutorial Guruji team.
The question is published on by Tutorial Guruji team.
I have a data frame with a several NaN
values in columns. The data in” STAN-x” needs only to match the associated “Statement”. And duplicate “Statement” values can be eliminated.
So I have the following data frame:
Statement STAN-A STAN-B STAN-C STAN-D STAN-E STAN-F 0 Statement A AB.AM-1 ABC ABC 1 NaN NaN NaN NaN 1 Statement A AB.AM-1 NaN ABCDE 5 ABCDE.01 NaN NaN NaN 1 Statement A AB.AM-1 NaN ABCDE 5 ABCDE.02 NaN NaN NaN 2 Statement A AB.AM-1 NaN NaN ABC 62443-2-1:2009 4.2.3.4 NaN NaN 3 Statement A AB.AM-1 NaN NaN ABC 62443-3-3:2013 SR 7.8 NaN NaN 4 Statement A AB.AM-1 NaN NaN NaN ABC/ABC 27001:2013 A.8.1.1 NaN 4 Statement A AB.AM-1 NaN NaN NaN A.8.1.2 NaN 5 Statement A AB.AM-1 NaN NaN NaN NaN ABCD AB 800-53 Rev. 4 CM-8 5 Statement A AB.AM-1 NaN NaN NaN NaN PM-5 6 Statement B AB.AM-2 ABC ABC 2 NaN NaN NaN NaN 7 Statement B AB.AM-2 NaN ABCDE 5 ABCDE.01 NaN NaN NaN 7 Statement B AB.AM-2 NaN ABCDE 5 ABCDE.02 NaN NaN NaN 7 Statement B AB.AM-2 NaN ABCDE 5 ABCDE.05 NaN NaN NaN 8 Statement B AB.AM-2 NaN NaN ABC 62443-2-1:2009 4.2.3.4 NaN NaN 9 Statement B AB.AM-2 NaN NaN ABC 62443-3-3:2013 SR 7.8 NaN NaN 10 Statement B AB.AM-2 NaN NaN NaN ABC/ABC 27001:2013 A.8.1.1 NAN 11 Statement B AB.AM-2 NaN NaN NaN NAN ABCD AB 800-53 Rev. 5 CM-9
And I’m trying to turn it into this:
Statement STAN-A STAN-B STAN-C STAN-D STAN-E STAN-F 0 Statement A AB.AM-1 ABC ABC 1 ABCDE 5 ABCDE.01 ABC 62443-2-1:2009 4.2.3.4 ABC/ABC 27001:2013 A.8.1.1 ABCD AB 800-53 Rev. 4 CM-8 1 Statement A AB.AM-1 NaN ABCDE 5 ABCDE.02 ABC 62443-3-3:2013 SR 7.8 A.8.1.2 PM-5 2 Statement A AB.AM-1 NaN NaN ABC 62443-2-1:2009 4.2.3.4 NaN NaN 3 Statement B AB.AM-2 ABC ABC 2 ABCDE 5 ABCDE.01 ABC 62443-2-1:2009 4.2.3.4 ABC/ABC 27001:2013 A.8.1.1 ABCD AB 800-53 Rev. 5 CM-9 4 Statement B AB.AM-2 NaN ABCDE 5 ABCDE.02 ABC 62443-3-3:2013 SR 7.8 NaN NaN 5 Statement B AB.AM-2 NaN ABCDE 5 ABCDE.05 NaN NaN NaN
So far I’ve tried df.dropna()
, but, of course, that leaves me with no values. I’ve also tried the following:
df.assign(**{'STAN-B': df['STAN-B'].join(df['STAN-B'].dropna())})
But I get:
AttributeError: 'Series' object has no attribute 'join'
Answer
Try:
x = df.groupby("Statement").apply(lambda x: x.apply(sorted, key=pd.isna)) x = x.dropna(subset=x.loc[:, "STAN-B":].columns, how="all") print(x.reset_index(drop=True))
Prints:
Statement STAN-A STAN-B STAN-C STAN-D STAN-E STAN-F 0 Statement A AB.AM-1 ABC ABC 1 ABCDE 5 ABCDE.01 ABC 62443-2-1:2009 4.2.3.4 ABC/ABC 27001:2013 A.8.1.1 ABCD AB 800-53 Rev. 4 CM-8 1 Statement A AB.AM-1 NaN ABCDE 5 ABCDE.02 ABC 62443-3-3:2013 SR 7.8 A.8.1.2 PM-5 2 Statement B AB.AM-2 ABC ABC 2 ABCDE 5 ABCDE.01 ABC 62443-2-1:2009 4.2.3.4 ABC/ABC 27001:2013 A.8.1.1 ABCD AB 800-53 Rev. 5 CM-9 3 Statement B AB.AM-2 NaN ABCDE 5 ABCDE.02 ABC 62443-3-3:2013 SR 7.8 NaN NaN 4 Statement B AB.AM-2 NaN ABCDE 5 ABCDE.05 NaN NaN NaN
NOTE: If you’re creating this dataframe with pd.concat
, try to add axis=1
as a parameter.
We are here to answer your question about Pandas: Clean Up Data Frame by Eliminating NaN - If you find the proper solution, please don't forgot to share this with your team members.