data-wrangling

change iterrows() to .loc for large dataframes

change iterrows() to .loc for large dataframes Question: I have 2 data frames, df1 and df2. Based on the condition in df1 that day_of_week == 7 we have to match 2 other column values, (statWeek and statMonth) if the condition matches then we have to replace as_cost_perf from df2 with cost_eu from df1. in other …

Total answers: 3

Transform rows categories to column while preserving rest of the data frame python

Transform rows categories to column while preserving rest of the data frame python Question: I have data frame as below Time Groups Entity GC Seg Category Year Quarter IndicatorName Value 0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 yhat 568759.481223 1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 586003.965652 2 2021-08-01 KRO …

Total answers: 2

Apply a function to dataframe which includes previous row data

Apply a function to dataframe which includes previous row data Question: I have an input dataframe for daily fruit spend which looks like this: spend_df Date Apples Pears Grapes 01/01/22 10 47 0 02/01/22 0 22 3 03/01/22 11 0 3 … For each fruit, I need to apply a function using their respective parameters …

Total answers: 2

Reformat only cells that contain

Reformat only cells that contain Question: I’m trying to understand how I can find cells in a dataframe that contain a specific substring (I want to chop off the ‘R’ at the end of certain strings) and reformat those cell to leave the original value minus the last character example: "Value" "Designator" 1 47 R12 …

Total answers: 2

How to overwrite a subset of a column in pandas with a dictionary

How to overwrite a subset of a column in pandas with a dictionary Question: I have a full dataframe (no NaNs) with some wrong cells. I created a dictionary that has as some identifier as keys and the correct value as value. I would like to overwrite only the cells in the column of the …

Total answers: 1

reshape pandas data frame: duplicated rows to columns, with textual data

reshape pandas data frame: duplicated rows to columns, with textual data Question: I have a dataframe like this: INDEX_COL col1 A Random Text B Some more random text C more stuff A Blah B Blah, Blah C Yet more stuff A erm B yup C whatever What I need is it reformed into new columns …

Total answers: 3

Split variable in Pyspark

Split variable in Pyspark Question: I try to split the utc value found in timestamp_value in a new column called utc. I tried to use the Python RegEx but I was not able to do it. Thank you for your answer! This is how my dataframe looks like +——–+—————————-+ |machine |timestamp_value | +——–+—————————-+ |1 |2022-01-06T07:47:37.319+0000| …

Total answers: 1