Viewing the content of a Spark Dataframe Column

Question:

I’m using Spark 1.3.1.

I am trying to view the values of a Spark dataframe column in Python. With a Spark dataframe, I can do df.collect() to view the contents of the dataframe, but there is no such method for a Spark dataframe column as best as I can see.

For example, the dataframe df contains a column named 'zip_code'. So I can do df['zip_code'] and it turns a pyspark.sql.dataframe.Column type, but I can’t find a way to view the values in df['zip_code'].

Asked By: John Lin

||

Answers:

You can access underlying RDD and map over it

df.rdd.map(lambda r: r.zip_code).collect()

You can also use select if you don’t mind results wrapped using Row objects:

df.select('zip_code').collect()

Finally, if you simply want to inspect content then show method should be enough:

df.select('zip_code').show()
Answered By: zero323

To view the complete content:

df.select("raw").take(1).foreach(println)

(show will show you an overview).

Answered By: Thomas Decaux

You can simply write:

df.select('your column's name').show()

In your case here, it will be:

df.select('zip_code').show()
Answered By: Cicilio