I want to compare each row with its previous row size, lets say if first row have 6kb size and 2nd row has 2kb size. if second row of dataframe has 50% less size than the previous one then that row should be printed.

following is my dataframe,

size number key date 0 120 K 12345 Hello 20181002 1 119 K 12345 No 20181001 2 30 K 12345 Hello 20181003 3 90 K 12345 No 20181003 4 150 K 12345 Hello 20181004 5 180 M 12345 No 20181005 6 70 M 12345 Hello 20181006

in above dataframe 2nd row compare with 1st and the difference in not less than 50% then it will ignore, but 3rd row size is less than 50% of 2nd row so it will print 3rd row same for 6th row will be print as it is less than 50% of size.

## Answer

You can use `.replace()`

to translate the `size`

column with `K`

, `M`

, `G`

, etc. to their corresponding values scaled up by the magnitude symbols, as follows:

`K`

converted to `e+03`

in scientific notation

`M`

converted to `e+06`

in scientific notation

`G`

converted to `e+09`

in scientific notation

**(supports integer as well as float numbers in **

*any number of decimal places*)

Then, convert the text in scientific notation to float type, followed by casting to integer for final required format, as follows:

size_val = df['size'].replace({' ': '', 'K': 'e+03', 'M': 'e+06', 'G': 'e+09'}, regex=True).astype(float).astype(int)

Then, use `df.loc`

to filter the rows with size ratio of current row and previous row (with getting values of previous row by `.shift()`

):

df.loc[(size_val / size_val.shift()) < 0.5]

**Result:**

size number key date 2 30 K 12345 Hello 20181003 6 70 M 12345 Hello 20181006

* Translated values of * (in

`size`

`size_val`

) *(translated from texts to integers)*

**are the actual values**

**scaled up by the magnitude symbols:**print(size_val) 0 120000 1 119000 2 30000 3 90000 4 150000 5 180000000 6 70000000 Name: size, dtype: int32