Reference dataframes using indices stored in another dataframe
Question:
I’m trying to reference data from source dataframes, using indices stored in another dataframe.
For example, let’s say we have a "shifts" dataframe with the names of the people on duty on each date (some values can be NaN):
a b c
2023-01-01 Sam Max NaN
2023-01-02 Mia NaN Max
2023-01-03 NaN Sam Mia
Then we have a "performance" dataframe, with the performance of each employee on each date. Row indices are the same as the shifts dataframe, but column names are different:
Sam Mia Max Ian
2023-01-01 4.5 NaN 3.0 NaN
2023-01-02 NaN 2.0 3.0 NaN
2023-01-03 4.0 3.0 NaN 4.0
and finally we have a "salary" dataframe, whose structure and indices are different from the other two dataframes:
Employee Salary
0 Sam 100
1 Mia 90
2 Max 80
3 Ian 70
I need to create two output dataframes, with same structure and indices as "shifts".
In the first one, I need to substitute the employee name with his/her performance on that date.
In the second output dataframe, the employee name is replaced with his/her salary. Theses are the expected outputs:
Output 1:
a b c
2023-01-01 4.5 3.0 NaN
2023-01-02 2.0 NaN 3.0
2023-01-03 NaN 4.0 3.0
Output 2:
a b c
2023-01-01 100.0 80.0 NaN
2023-01-02 90.0 NaN 80.0
2023-01-03 NaN 100.0 90.0
Any idea of how to do it? Thanks
Answers:
For the first one:
(shifts
.reset_index().melt('index')
.merge(performance.stack().rename('p'),
left_on=['index', 'value'], right_index=True)
.pivot(index='index', columns='variable', values='p')
.reindex_like(shifts)
)
Output:
a b c
2023-01-01 4.5 3.0 NaN
2023-01-02 2.0 NaN 3.0
2023-01-03 NaN 4.0 3.0
For the second:
shifts.replace(salary.set_index('Employee')['Salary'])
Output:
a b c
2023-01-01 100.0 80.0 NaN
2023-01-02 90.0 NaN 80.0
2023-01-03 NaN 100.0 90.0
Here’s a way to do what your question asks:
out1 = ( shifts.stack()
.rename_axis(index=('date','shift'))
.reset_index().rename(columns={0:'employee'})
.pipe(lambda df: df.assign(perf=
df.apply(lambda row: perf.loc[row.date, row.employee], axis=1)))
.pivot(index='date', columns='shift', values='perf')
.rename_axis(index=None, columns=None) )
out2 = shifts.replace(salary.set_index('Employee').Salary)
Output:
Output 1:
a b c
2023-01-01 4.5 3.0 NaN
2023-01-02 2.0 NaN 3.0
2023-01-03 NaN 4.0 3.0
Output 2:
a b c
2023-01-01 100.0 80.0 NaN
2023-01-02 90.0 NaN 80.0
2023-01-03 NaN 100.0 90.0
I’m trying to reference data from source dataframes, using indices stored in another dataframe.
For example, let’s say we have a "shifts" dataframe with the names of the people on duty on each date (some values can be NaN):
a b c
2023-01-01 Sam Max NaN
2023-01-02 Mia NaN Max
2023-01-03 NaN Sam Mia
Then we have a "performance" dataframe, with the performance of each employee on each date. Row indices are the same as the shifts dataframe, but column names are different:
Sam Mia Max Ian
2023-01-01 4.5 NaN 3.0 NaN
2023-01-02 NaN 2.0 3.0 NaN
2023-01-03 4.0 3.0 NaN 4.0
and finally we have a "salary" dataframe, whose structure and indices are different from the other two dataframes:
Employee Salary
0 Sam 100
1 Mia 90
2 Max 80
3 Ian 70
I need to create two output dataframes, with same structure and indices as "shifts".
In the first one, I need to substitute the employee name with his/her performance on that date.
In the second output dataframe, the employee name is replaced with his/her salary. Theses are the expected outputs:
Output 1:
a b c
2023-01-01 4.5 3.0 NaN
2023-01-02 2.0 NaN 3.0
2023-01-03 NaN 4.0 3.0
Output 2:
a b c
2023-01-01 100.0 80.0 NaN
2023-01-02 90.0 NaN 80.0
2023-01-03 NaN 100.0 90.0
Any idea of how to do it? Thanks
For the first one:
(shifts
.reset_index().melt('index')
.merge(performance.stack().rename('p'),
left_on=['index', 'value'], right_index=True)
.pivot(index='index', columns='variable', values='p')
.reindex_like(shifts)
)
Output:
a b c
2023-01-01 4.5 3.0 NaN
2023-01-02 2.0 NaN 3.0
2023-01-03 NaN 4.0 3.0
For the second:
shifts.replace(salary.set_index('Employee')['Salary'])
Output:
a b c
2023-01-01 100.0 80.0 NaN
2023-01-02 90.0 NaN 80.0
2023-01-03 NaN 100.0 90.0
Here’s a way to do what your question asks:
out1 = ( shifts.stack()
.rename_axis(index=('date','shift'))
.reset_index().rename(columns={0:'employee'})
.pipe(lambda df: df.assign(perf=
df.apply(lambda row: perf.loc[row.date, row.employee], axis=1)))
.pivot(index='date', columns='shift', values='perf')
.rename_axis(index=None, columns=None) )
out2 = shifts.replace(salary.set_index('Employee').Salary)
Output:
Output 1:
a b c
2023-01-01 4.5 3.0 NaN
2023-01-02 2.0 NaN 3.0
2023-01-03 NaN 4.0 3.0
Output 2:
a b c
2023-01-01 100.0 80.0 NaN
2023-01-02 90.0 NaN 80.0
2023-01-03 NaN 100.0 90.0