Pyspark: display a spark data frame in a table format

I am using pyspark to read a parquet file like below:

my_df ='hdfs://myPath/myDB.db/myTable/**')

Then when I do my_df.take(5), it will show [Row(...)], instead of a table format like when we use the pandas data frame.

Is it possible to display the data frame in a table format like pandas data frame? Thanks!


The show method does what you’re looking for.

For example, given the following dataframe of 3 rows, I can print just the first two rows like this:

df = sqlContext.createDataFrame([("foo", 1), ("bar", 2), ("baz", 3)], ('k', 'v'))

which yields:

|  k|  v|
|foo|  1|
|bar|  2|
only showing top 2 rows

