Python: adding observations based on an index restart in pandas dataframe
Question:
I have a dataframe that looks like this:
df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4,5,6 ],
'data': [ 1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4,5.3,0.6]})
I would like to have a new month when the index count starts again at 1. The month allocation does not follow a pattern, but I created a list that stores the months I want to have attributed to each restart of the count.
What I want is something like this dataframe:
months =["April", "June"]
df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4,5,6 ],
'data': [ 1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4,5.3,0.6],
'Month': ['April', 'April', 'April', 'April', 'April', 'April', 'June', 'June', 'June', 'June', 'June', 'June'] })
Answers:
you can iterate over the index column:
import pandas as pd
df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 ],
'data': [ 1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4, 5.3, 0.6]})
months = ["April", "June"]
month_index = 0
month_dict = {}
for i in range(len(df)):
if df.loc[i, 'index'] == 1:
month_index = (month_index + 1) % len(months)
month_dict[1] = months[month_index]
month_dict[df.loc[i, 'index']] = months[month_index]
df['Month'] = df['index'].map(month_dict)
print(df)
Lets assign the month where index == 1 then forward fill the remaining rows
m = df['index'] == 1
df.loc[m, 'month'] = months[:m.sum()]
df['month'] = df['month'].ffill()
Result
index data month
0 1 1.500 April
1 2 0.220 April
2 3 0.323 April
3 4 4.400 April
4 5 5.620 April
5 6 0.560 April
6 1 1.320 June
7 2 2.100 June
8 3 3.090 June
9 4 4.000 June
10 5 5.300 June
11 6 0.600 June
Using numpy indexing:
df['Month'] = np.array(months)[df['index'].eq(1).cumsum()-1]
Safer pandas alternative if you potentially have missing values:
df['Month'] = pd.Series(months).reindex(df['index'].eq(1).cumsum()-1).values
Output:
index data Month
0 1 1.500 April
1 2 0.220 April
2 3 0.323 April
3 4 4.400 April
4 5 5.620 April
5 6 0.560 April
6 1 1.320 June
7 2 2.100 June
8 3 3.090 June
9 4 4.000 June
10 5 5.300 June
11 6 0.600 June
I have a dataframe that looks like this:
df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4,5,6 ],
'data': [ 1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4,5.3,0.6]})
I would like to have a new month when the index count starts again at 1. The month allocation does not follow a pattern, but I created a list that stores the months I want to have attributed to each restart of the count.
What I want is something like this dataframe:
months =["April", "June"]
df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4,5,6 ],
'data': [ 1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4,5.3,0.6],
'Month': ['April', 'April', 'April', 'April', 'April', 'April', 'June', 'June', 'June', 'June', 'June', 'June'] })
you can iterate over the index column:
import pandas as pd
df = pd.DataFrame({'index': [ 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 ],
'data': [ 1.5, 0.22, 0.323, 4.4, 5.62, 0.56, 1.32, 2.1, 3.09, 4, 5.3, 0.6]})
months = ["April", "June"]
month_index = 0
month_dict = {}
for i in range(len(df)):
if df.loc[i, 'index'] == 1:
month_index = (month_index + 1) % len(months)
month_dict[1] = months[month_index]
month_dict[df.loc[i, 'index']] = months[month_index]
df['Month'] = df['index'].map(month_dict)
print(df)
Lets assign the month where index == 1 then forward fill the remaining rows
m = df['index'] == 1
df.loc[m, 'month'] = months[:m.sum()]
df['month'] = df['month'].ffill()
Result
index data month
0 1 1.500 April
1 2 0.220 April
2 3 0.323 April
3 4 4.400 April
4 5 5.620 April
5 6 0.560 April
6 1 1.320 June
7 2 2.100 June
8 3 3.090 June
9 4 4.000 June
10 5 5.300 June
11 6 0.600 June
Using numpy indexing:
df['Month'] = np.array(months)[df['index'].eq(1).cumsum()-1]
Safer pandas alternative if you potentially have missing values:
df['Month'] = pd.Series(months).reindex(df['index'].eq(1).cumsum()-1).values
Output:
index data Month
0 1 1.500 April
1 2 0.220 April
2 3 0.323 April
3 4 4.400 April
4 5 5.620 April
5 6 0.560 April
6 1 1.320 June
7 2 2.100 June
8 3 3.090 June
9 4 4.000 June
10 5 5.300 June
11 6 0.600 June