pyspark-pandas

How to Group by Conditional aggregation of adjacent rows In PySpark

How to Group by Conditional aggregation of adjacent rows In PySpark Question: I am facing issue when doing conditional grouping in spark dataframe Below is complete example I have a dataframe, which has been sorted by user and by time activity location user 0 watch movie house A 1 sleep house A 2 cardio gym …

Total answers: 1

i want to sum date in a looping 13 times using pyspark

i want to sum date in a looping 13 times using pyspark Question: Please help me to solve this issue, as I am still new to Python/Pyspark. I want to do a loop to do a date sum in multiples of 7 for 13 times in the same column. I have a master table like …

Total answers: 1

Pandas to Pyspark conversion (repeat/explode)

Pandas to Pyspark conversion (repeat/explode) Question: I’m trying to take a notebook that I’ve written in Python/Pandas and modify/convert it to use Pyspark. The dataset I’m working with is (as real world datasets often are) complete and utter garbage, and so some of the things I have to do to it are potentially a little …

Total answers: 1

Pyspark: Compare Column Values across different dataframe

Pyspark: Compare Column Values across different dataframe Question: we are planning to do the following, compare two dataframe, based on comparision add values into first dataframe and then groupby to have combined data. We are using pyspark dataframe and the following are our dataframes. Dataframe1: | Manager | Department | isHospRelated | ——– | ————– …

Total answers: 1

PySpark: Create a condition from a string

PySpark: Create a condition from a string Question: I have to apply conditions to pyspark dataframes based on a distribution. My distribution looks like: mp = [413, 291, 205, 169, 135] And I am generating condition expression like this: when_decile = (F.when((F.col(colm) >= float(mp[0])), F.lit(1)) .when( (F.col(colm) >= float(mp[1])) & (F.col(colm) < float(mp[0])), F.lit(2)) .when( …

Total answers: 2