Reference dataframes using indices stored in another dataframe

Question:

I’m trying to reference data from source dataframes, using indices stored in another dataframe.

For example, let’s say we have a "shifts" dataframe with the names of the people on duty on each date (some values can be NaN):

              a    b    c
2023-01-01  Sam  Max  NaN
2023-01-02  Mia  NaN  Max
2023-01-03  NaN  Sam  Mia

Then we have a "performance" dataframe, with the performance of each employee on each date. Row indices are the same as the shifts dataframe, but column names are different:

            Sam  Mia  Max  Ian
2023-01-01  4.5  NaN  3.0  NaN
2023-01-02  NaN  2.0  3.0  NaN
2023-01-03  4.0  3.0  NaN  4.0

and finally we have a "salary" dataframe, whose structure and indices are different from the other two dataframes:

  Employee  Salary
0      Sam     100
1      Mia      90
2      Max      80
3      Ian      70

I need to create two output dataframes, with same structure and indices as "shifts".
In the first one, I need to substitute the employee name with his/her performance on that date.
In the second output dataframe, the employee name is replaced with his/her salary. Theses are the expected outputs:

Output 1:
              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

Output 2:
                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

Any idea of how to do it? Thanks

Asked By: younggotti

||

Answers:

For the first one:

(shifts
 .reset_index().melt('index')
 .merge(performance.stack().rename('p'),
        left_on=['index', 'value'], right_index=True)
 .pivot(index='index', columns='variable', values='p')
 .reindex_like(shifts)
)

Output:

              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

For the second:

shifts.replace(salary.set_index('Employee')['Salary'])

Output:

                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0
Answered By: mozway

Here’s a way to do what your question asks:

out1 = ( shifts.stack()
    .rename_axis(index=('date','shift'))
    .reset_index().rename(columns={0:'employee'})
    .pipe(lambda df: df.assign(perf=
        df.apply(lambda row: perf.loc[row.date, row.employee], axis=1)))
    .pivot(index='date', columns='shift', values='perf')
    .rename_axis(index=None, columns=None) )

out2 = shifts.replace(salary.set_index('Employee').Salary)

Output:

Output 1:
              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

Output 2:
                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0
Answered By: constantstranger
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.