Python: adding observations based on an index restart in pandas dataframe

Question:

I have a dataframe that looks like this:

df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4,5,6 ],
         'data': [  1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4,5.3,0.6]})

I would like to have a new month when the index count starts again at 1. The month allocation does not follow a pattern, but I created a list that stores the months I want to have attributed to each restart of the count.

What I want is something like this dataframe:

months =["April", "June"]

df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4,5,6 ], 
          'data': [  1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4,5.3,0.6],
         'Month': ['April', 'April', 'April', 'April', 'April', 'April', 'June', 'June', 'June', 'June', 'June', 'June'] })

Asked By: Lisa

||

Answers:

you can iterate over the index column:

import pandas as pd

df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 ],
                   'data': [ 1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4, 5.3, 0.6]})

months = ["April", "June"]
month_index = 0
month_dict = {}

for i in range(len(df)):
    if df.loc[i, 'index'] == 1:
        month_index = (month_index + 1) % len(months)
        month_dict[1] = months[month_index]
    month_dict[df.loc[i, 'index']] = months[month_index]

df['Month'] = df['index'].map(month_dict)

print(df)
Answered By: Phoenix

Lets assign the month where index == 1 then forward fill the remaining rows

m = df['index'] == 1
df.loc[m, 'month'] = months[:m.sum()] 
df['month'] = df['month'].ffill()

Result

    index   data  month
0       1  1.500  April
1       2  0.220  April
2       3  0.323  April
3       4  4.400  April
4       5  5.620  April
5       6  0.560  April
6       1  1.320   June
7       2  2.100   June
8       3  3.090   June
9       4  4.000   June
10      5  5.300   June
11      6  0.600   June
Answered By: Shubham Sharma

Using indexing:

df['Month'] = np.array(months)[df['index'].eq(1).cumsum()-1]

Safer alternative if you potentially have missing values:

df['Month'] = pd.Series(months).reindex(df['index'].eq(1).cumsum()-1).values

Output:

    index   data  Month
0       1  1.500  April
1       2  0.220  April
2       3  0.323  April
3       4  4.400  April
4       5  5.620  April
5       6  0.560  April
6       1  1.320   June
7       2  2.100   June
8       3  3.090   June
9       4  4.000   June
10      5  5.300   June
11      6  0.600   June
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.