Add prefix and reset index in pyspark dataframe

Here’s what I usually do in pandas

cdr = datamonthly.pivot(index="msisdn", columns="last_x_month", values="arpu_sum").add_prefix('arpu_sum_l').reset_index()

But what I did in Pyspark

cdr = datamonthly.groupBy("msisdn").pivot("last_x_month").sum("arpu_sum")

I cant find alternative for add_prefix(‘arpu_sum_l’).reset_index()

Answer

There is nothing similar to pandas’ add_prefix in spark when doing pivot. But, you can try a workaround like creating a column from concatenation of the custom prefix string and the value of the column to be pivoted.

import pyspark.sql.functions as F

cdr = datamonthly.withColumn("p", F.expr("concat('arpu_sum_l_', last_x_month)")).groupBy("msisdn").pivot("p").sum("arpu_sum")