pyspark-dataframes

When clause in pyspark gives an error "name 'when' is not defined"

When clause in pyspark gives an error "name 'when' is not defined" Question: With the below code I am getting an error message, name ‘when’ is not defined. voter_df = voter_df.withColumn(‘random_val’, when(voter_df.TITLE == ‘Councilmember’, F.rand()) .when(voter_df.TITLE == ‘Mayor’, 2) .otherwise(0)) Add a column to voter_df named random_val with the results of the F.rand() method for …

Total answers: 2

Split count results of different events into different columns in pyspark

Split count results of different events into different columns in pyspark Question: I have a rdd from which I need to extract counts of multiple events. The initial rdd looks like this +———-+——————–+——————-+ | event| user| day| +———-+——————–+——————-+ |event_x |user_A | 0| |event_y |user_A | 2| |event_x |user_B | 2| |event_y |user_B | 1| |event_x …

Total answers: 2

How to determine what are the columns I need since ApplyMapping is'nt case sensitive?

How to determine what are the columns I need since ApplyMapping is'nt case sensitive? Question: I’m updating a Pyspark script with a new Database model and I’ve encountered some problems calling/updating columns since PySpark apparently brings all columns in uppercase but when I use ApplyMapping it is not case sensitive BUT when I join(By left) …

Total answers: 2

How to perform union on two DataFrames with different amounts of columns in spark?

How to perform union on two DataFrames with different amounts of columns in Spark? Question: I have 2 DataFrames: I need union like this: The unionAll function doesn’t work because the number and the name of columns are different. How can I do this? Asked By: Allan Feliph || Source Answers: In Scala you just …

Total answers: 22