Dataframe column with subsequent matching value's index including range wrap-around direction from final index to initial index
Question:
How can I create Dataframe column(s) with the subsequent indexes for a certain value? I know I can find the matching indexes with
b_Index = df[df.Type=='B'].index
c_Index = df[df.Type=='C'].index
but I’m in need of a solution which includes the wrap-around case such that the ‘next’ index after the final match is the first index.
Say I have a dataframe with a Type
series. Type
includes values A, B or C.
d = dict(Type=['A', 'A', 'A', 'C', 'C', 'C', 'A', 'A', 'C', 'A', 'B', 'B', 'B', 'A'])
df = pd.DataFrame(d)
Type
0 A
1 A
2 A
3 C
4 C
5 C
6 A
7 A
8 C
9 A
10 B
11 B
12 B
13 A
I’m looking to add NextForwardBIndex
and NextForwardCIndex
columns such that the result is
Type NextForwardBIndex NextForwardCIndex
0 A 10 3
1 A 10 3
2 A 10 3
3 C 10 4
4 C 10 5
5 C 10 8
6 A 10 8
7 A 10 8
8 C 10 3
9 A 10 3
10 B 11 3
11 B 12 3
12 B 10 3
13 A 10 3
Answers:
This should work:
df2 = df['Type'].str.get_dummies().mul(s.index,axis=0).shift(-1).where(lambda x: x.ne(0)).bfill()
df2.fillna(df2.iloc[0]).rename('NextForward{}Index'.format,axis=1)
Old Answer:
(df.assign(NextForwardBIndex = df.loc[df['Type'].eq('B')].groupby(df['Type']).transform(lambda x: x.index.to_series().shift(-1)),
NextForwardCIndex = df.loc[df['Type'].eq('C')].groupby(df['Type']).transform(lambda x: x.index.to_series().shift(-1)))
.fillna({'NextForwardBIndex':df['Type'].eq('B').idxmax(),'NextForwardCIndex':df['Type'].eq('C').idxmax()}))
Output:
NextForwardAIndex NextForwardBIndex NextForwardCIndex
0 1.0 10.0 3.0
1 2.0 10.0 3.0
2 6.0 10.0 3.0
3 6.0 10.0 4.0
4 6.0 10.0 5.0
5 6.0 10.0 8.0
6 7.0 10.0 8.0
7 9.0 10.0 8.0
8 9.0 10.0 3.0
9 13.0 10.0 3.0
10 13.0 11.0 3.0
11 13.0 12.0 3.0
12 13.0 10.0 3.0
13 1.0 10.0 3.0
You can use a bit of numpy.roll
, pandas.ffill
, and pandas.fillna
:
# roll indices and assign the next values for B/C rows
df.loc[b_Index, 'NextForwardBIndex'] = np.roll(b_Index,-1)
df.loc[c_Index, 'NextForwardCIndex'] = np.roll(c_Index,-1)
# fill missing values
(df.ffill()
.fillna({'NextForwardBIndex': b_Index[0],
'NextForwardCIndex': c_Index[0]})
.astype(int, errors='ignore')
)
output:
Type NextForwardBIndex NextForwardCIndex
0 A 10 3
1 A 10 3
2 A 10 3
3 C 4 4
4 C 5 5
5 C 8 8
6 A 8 8
7 A 8 8
8 C 3 3
9 A 3 3
10 B 11 3
11 B 12 3
12 B 10 3
13 A 10 3
How can I create Dataframe column(s) with the subsequent indexes for a certain value? I know I can find the matching indexes with
b_Index = df[df.Type=='B'].index
c_Index = df[df.Type=='C'].index
but I’m in need of a solution which includes the wrap-around case such that the ‘next’ index after the final match is the first index.
Say I have a dataframe with a Type
series. Type
includes values A, B or C.
d = dict(Type=['A', 'A', 'A', 'C', 'C', 'C', 'A', 'A', 'C', 'A', 'B', 'B', 'B', 'A'])
df = pd.DataFrame(d)
Type
0 A
1 A
2 A
3 C
4 C
5 C
6 A
7 A
8 C
9 A
10 B
11 B
12 B
13 A
I’m looking to add NextForwardBIndex
and NextForwardCIndex
columns such that the result is
Type NextForwardBIndex NextForwardCIndex
0 A 10 3
1 A 10 3
2 A 10 3
3 C 10 4
4 C 10 5
5 C 10 8
6 A 10 8
7 A 10 8
8 C 10 3
9 A 10 3
10 B 11 3
11 B 12 3
12 B 10 3
13 A 10 3
This should work:
df2 = df['Type'].str.get_dummies().mul(s.index,axis=0).shift(-1).where(lambda x: x.ne(0)).bfill()
df2.fillna(df2.iloc[0]).rename('NextForward{}Index'.format,axis=1)
Old Answer:
(df.assign(NextForwardBIndex = df.loc[df['Type'].eq('B')].groupby(df['Type']).transform(lambda x: x.index.to_series().shift(-1)),
NextForwardCIndex = df.loc[df['Type'].eq('C')].groupby(df['Type']).transform(lambda x: x.index.to_series().shift(-1)))
.fillna({'NextForwardBIndex':df['Type'].eq('B').idxmax(),'NextForwardCIndex':df['Type'].eq('C').idxmax()}))
Output:
NextForwardAIndex NextForwardBIndex NextForwardCIndex
0 1.0 10.0 3.0
1 2.0 10.0 3.0
2 6.0 10.0 3.0
3 6.0 10.0 4.0
4 6.0 10.0 5.0
5 6.0 10.0 8.0
6 7.0 10.0 8.0
7 9.0 10.0 8.0
8 9.0 10.0 3.0
9 13.0 10.0 3.0
10 13.0 11.0 3.0
11 13.0 12.0 3.0
12 13.0 10.0 3.0
13 1.0 10.0 3.0
You can use a bit of numpy.roll
, pandas.ffill
, and pandas.fillna
:
# roll indices and assign the next values for B/C rows
df.loc[b_Index, 'NextForwardBIndex'] = np.roll(b_Index,-1)
df.loc[c_Index, 'NextForwardCIndex'] = np.roll(c_Index,-1)
# fill missing values
(df.ffill()
.fillna({'NextForwardBIndex': b_Index[0],
'NextForwardCIndex': c_Index[0]})
.astype(int, errors='ignore')
)
output:
Type NextForwardBIndex NextForwardCIndex
0 A 10 3
1 A 10 3
2 A 10 3
3 C 4 4
4 C 5 5
5 C 8 8
6 A 8 8
7 A 8 8
8 C 3 3
9 A 3 3
10 B 11 3
11 B 12 3
12 B 10 3
13 A 10 3