Access a specific item in PySpark dataframe

Question:

How can I access value at a certain index of a column in PySpark dataframe for example I want to access value at index 5 of a column named “Category”. How can I do that in PySpark syntax?

Asked By: moirK

||

Answers:

Something like this,

value = df.where(df.index == 5).select('Category').collect()[0]['Category']
                                         #assuming 'index' is index column
Answered By: mayank agrawal

Answer from @mayank is good, just continuation if index column is not present.

Data in csv file save it as demo_date.csv:

job number,from_date,to_date
1,01-10-2010,31-12-9999
2,02-10-2010,31-12-9999
3,03-10-2010,31-12-9999

code:

from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit

spark = SparkSession.builder.appName('Basics').getOrCreate()
df = spark.read.csv('demo_date.csv', header=True)
#df.show()

val = df.where(col('job number') == lit(2)).select('job number').collect()[0]['job number']

print(val)

above worked for me.

Answered By: Aadu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.