Transforming a DataFrame from long to wide with specific columns

Question:

I got a DataFrame looks like this (call it df1):

id    date   value

A1    day1    0.1
A1    day2    0.2
A1    day3   -0.1
A2    day1    0.3
A3    day2    0.2
A3    day4   -0.5

I need to convert the value to a matrix for calculation, so I think I need to transform the DataFrame to this form (call it df2) first (and then convert to a numpy array):

      day1  day2  day3  day4  day5
  A1   0.1   0.2  -0.1   0.0   0.0
  A2   0.3   0.0   0.0   0.0   0.0
  A3   0.0   0.2   0.0  -0.5   0.0 

if an id don’t have value on a day, just set that day’s value to 0 (and probably none of the ids
have a full-date value)
.

What I think is to generate an empty DataFrame (call it df3) first and then fill df1’s data in it:

      day1  day2  day3  day4  day5
  A1   0.0   0.0   0.0   0.0   0.0
  A2   0.0   0.0   0.0   0.0   0.0
  A3   0.0   0.0   0.0   0.0   0.0

But I don’t know the proper way to iterate df1’s value to match the cell in df3 (And people say it’s a bad idea to iterate a dataframe ?), or is there a better approach (like pivot or merge)?

Asked By: chapayev

||

Answers:

You could try df.pivot() to reshape the DataFrame

df2 = df1.pivot(index='id', columns='date').fillna(0.0)
df2.columns = ['day1', 'day2', 'day3', 'day4']
print(df2)

Output

   day1 day2  day3  day4
id                      
A1  0.1  0.2  -0.1   0.0
A2  0.3  0.0   0.0   0.0
A3  0.0  0.2   0.0  -0.5

Assume you have df3 as

   day1 day2 day3 day4 day5 day6 day7
id                                   
A1  0.0  0.0  0.0  0.0  0.0  0.0  0.0
A2  0.0  0.0  0.0  0.0  0.0  0.0  0.0
A3  0.0  0.0  0.0  0.0  0.0  0.0  0.0

You can merge

df4 = pd.merge(df2.reset_index(), df3.reset_index(), how='left').set_index('id').fillna(0.0
print(df4)

to get output

   day1 day2  day3  day4  day5  day6  day7
id                                        
A1  0.1  0.2  -0.1   0.0   0.0   0.0   0.0
A2  0.3  0.0   0.0   0.0   0.0   0.0   0.0
A3  0.0  0.2   0.0  -0.5   0.0   0.0   0.0
Answered By: perpetual student

This should work.

# pivot and reindex to add the missing days
df.pivot(*df).reindex(['day1', 'day2', 'day3', 'day4', 'day5'], axis=1).fillna(0).values

# array([[ 0.1,  0.2, -0.1,  0. ,  0. ],
#        [ 0.3,  0. ,  0. ,  0. ,  0. ],
#        [ 0. ,  0.2,  0. , -0.5,  0. ]])
Answered By: not a robot
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.