How to backfill values based on a category pandas

Question:

I have a table such as the following, with vaccine manufacturer, date and total_vaccinations as a cumulative sum of the vaccine manufacturer over time

I am trying to ‘backfill’ the nulls, basically such that since this is a cumulative sum, dates with null values would have the previous date’s total vaccinations

| Vaccine Manufacturer | Date     | Total_Vaccinations |
| Pfizer               | 10/10/21 | 100                |
| Pfizer               | 10/11/21 | NaN                |
| Pfizer               | 10/12/21 | NaN                |
| Pfizer               | 10/13/21 | 130                |
| Pfizer               | 10/14/21 | NaN                |
| Moderna              | 10/10/21 | 220                |
| Moderna              | 10/11/21 | NaN                |
| Moderna              | 10/12/21 | 280                |
| Moderna              | 10/13/21 | NaN                |

I’ve tried the following:

df['Total_Vaccinations'] = df.groupby(['Vaccine Manufacturer', 'Date'])['Total_Vaccinations'].bfill()

and, after consulting stack overflow:

df['Total_Vaccinations'] = df.groupby(['Vaccine Manufacturer', 'Date'])['Total_Vaccinations'].ffill().bfill()

Neither of these return what I’m looking for. The first does nothing, and returns the same values as the original dataframe, and the second one basically just forward fills.

For reference, this is my desired output:

| Vaccine Manufacturer | Date     | Total_Vaccinations |
| Pfizer               | 10/10/21 | 100                |
| Pfizer               | 10/11/21 | 100                |
| Pfizer               | 10/12/21 | 100                |
| Pfizer               | 10/13/21 | 130                |
| Pfizer               | 10/14/21 | 130                |
| Moderna              | 10/10/21 | 220                |
| Moderna              | 10/11/21 | 220                |
| Moderna              | 10/12/21 | 280                |
| Moderna              | 10/13/21 | 280                |

What am I missing here?

Asked By: agra94

||

Answers:

Example

data = [['Pfizer', '10/10/21', 100],
        ['Pfizer', '10/11/21', None],
        ['Pfizer', '10/12/21', None],
        ['Pfizer', '10/13/21', 130],
        ['Pfizer', '10/14/21', None],
        ['Moderna', '10/10/21', 220],
        ['Moderna', '10/11/21', None],
        ['Moderna', '10/12/21', 280],
        ['Moderna', '10/13/21', None]]
df = pd.DataFrame(data, columns=['M', 'Date', 'Total'])

df

    M       Date        Total
0   Pfizer  10/10/21    100.0
1   Pfizer  10/11/21    NaN
2   Pfizer  10/12/21    NaN
3   Pfizer  10/13/21    130.0
4   Pfizer  10/14/21    NaN
5   Moderna 10/10/21    220.0
6   Moderna 10/11/21    NaN
7   Moderna 10/12/21    280.0
8   Moderna 10/13/21    NaN

Code

df.assign(Total=df.groupby('M')['Total'].ffill().bfill())

output:

    M       Date        Total
0   Pfizer  10/10/21    100.0
1   Pfizer  10/11/21    100.0
2   Pfizer  10/12/21    100.0
3   Pfizer  10/13/21    130.0
4   Pfizer  10/14/21    130.0
5   Moderna 10/10/21    220.0
6   Moderna 10/11/21    220.0
7   Moderna 10/12/21    280.0
8   Moderna 10/13/21    280.0
Answered By: Panda Kim