Access a specific item in PySpark dataframe
Question:
How can I access value at a certain index of a column in PySpark dataframe for example I want to access value at index 5 of a column named “Category”. How can I do that in PySpark syntax?
Answers:
Something like this,
value = df.where(df.index == 5).select('Category').collect()[0]['Category']
#assuming 'index' is index column
Answer from @mayank is good, just continuation if index column is not present.
Data in csv file save it as demo_date.csv:
job number,from_date,to_date
1,01-10-2010,31-12-9999
2,02-10-2010,31-12-9999
3,03-10-2010,31-12-9999
code:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit
spark = SparkSession.builder.appName('Basics').getOrCreate()
df = spark.read.csv('demo_date.csv', header=True)
#df.show()
val = df.where(col('job number') == lit(2)).select('job number').collect()[0]['job number']
print(val)
above worked for me.
How can I access value at a certain index of a column in PySpark dataframe for example I want to access value at index 5 of a column named “Category”. How can I do that in PySpark syntax?
Something like this,
value = df.where(df.index == 5).select('Category').collect()[0]['Category']
#assuming 'index' is index column
Answer from @mayank is good, just continuation if index column is not present.
Data in csv file save it as demo_date.csv:
job number,from_date,to_date
1,01-10-2010,31-12-9999
2,02-10-2010,31-12-9999
3,03-10-2010,31-12-9999
code:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit
spark = SparkSession.builder.appName('Basics').getOrCreate()
df = spark.read.csv('demo_date.csv', header=True)
#df.show()
val = df.where(col('job number') == lit(2)).select('job number').collect()[0]['job number']
print(val)
above worked for me.