Modify dataframe with a given condition
Question:
col1
col2
col3
A1
data 1
Val B
data 2
data 6
Val B
data 3
data
A2
data 4
data
Val B
data 5
data 7
In the first column(col1), if ValB is found, just below the a cell, that starts with ‘A’, replace the only the ValB cell with the above cell element (that starts with A) retaining other values in the row of ValB. And ignore other ‘Val B’ rows if they are not below a cell that starts with A.
col1
col2
col3
A1
data 2
data 6
A2
data 5
data 7
Result
I want the result like this. Using python
Answers:
If need one row after match condition by Series.str.startswith
with replace col1
by original DataFrame use:
df = df.shift(-1)[df['col1'].str.startswith('A')].assign(col1 = df['col1'])
print (df)
col1 col2 col3
0 A1 data 2 data 6
3 A2 data 5 data 7
Another idea is shifting only col1
and then filter by condition in boolean indexing
:
df['col1'] = df['col1'].shift()
df = df[df['col1'].str.startswith('A', na=False)]
print (df)
col1 col2 col3
1 A1 data 2 data 6
4 A2 data 5 data 7
Example
data = [['A1', 'data 1', None],
['Val B', 'data 2', 'data 6'],
['Val B', 'data 3', 'data'],
['A2', 'data 4', 'data'],
['Val B', 'data 5', 'data 7'],
['A3', 'data 6', 'data 8'],
['A4', 'data 9', 'data 9']]
df = pd.DataFrame(data, columns=['col1', 'col2', 'col3'])
df
col1 col2 col3
0 A1 data 1 None
1 Val B data 2 data 6
2 Val B data 3 data
3 A2 data 4 data
4 Val B data 5 data 7
5 A3 data 6 data 8
6 A4 data 9 data 9
Code
s = df['col1'].mask(df['col1'].eq('Val B')).ffill()
df.assign(col1=s).groupby('col1').head(2).groupby('col1').tail(1)
output:
col1 col2 col3
1 A1 data 2 data 6
4 A2 data 5 data 7
5 A3 data 6 data 8
6 A4 data 9 data 9
I think there may be cases where ‘Val B’ does not exist under A. so i make example and code.
col1 | col2 | col3 |
---|---|---|
A1 | data 1 | |
Val B | data 2 | data 6 |
Val B | data 3 | data |
A2 | data 4 | data |
Val B | data 5 | data 7 |
In the first column(col1), if ValB is found, just below the a cell, that starts with ‘A’, replace the only the ValB cell with the above cell element (that starts with A) retaining other values in the row of ValB. And ignore other ‘Val B’ rows if they are not below a cell that starts with A.
col1 | col2 | col3 |
---|---|---|
A1 | data 2 | data 6 |
A2 | data 5 | data 7 |
Result
I want the result like this. Using python
If need one row after match condition by Series.str.startswith
with replace col1
by original DataFrame use:
df = df.shift(-1)[df['col1'].str.startswith('A')].assign(col1 = df['col1'])
print (df)
col1 col2 col3
0 A1 data 2 data 6
3 A2 data 5 data 7
Another idea is shifting only col1
and then filter by condition in boolean indexing
:
df['col1'] = df['col1'].shift()
df = df[df['col1'].str.startswith('A', na=False)]
print (df)
col1 col2 col3
1 A1 data 2 data 6
4 A2 data 5 data 7
Example
data = [['A1', 'data 1', None],
['Val B', 'data 2', 'data 6'],
['Val B', 'data 3', 'data'],
['A2', 'data 4', 'data'],
['Val B', 'data 5', 'data 7'],
['A3', 'data 6', 'data 8'],
['A4', 'data 9', 'data 9']]
df = pd.DataFrame(data, columns=['col1', 'col2', 'col3'])
df
col1 col2 col3
0 A1 data 1 None
1 Val B data 2 data 6
2 Val B data 3 data
3 A2 data 4 data
4 Val B data 5 data 7
5 A3 data 6 data 8
6 A4 data 9 data 9
Code
s = df['col1'].mask(df['col1'].eq('Val B')).ffill()
df.assign(col1=s).groupby('col1').head(2).groupby('col1').tail(1)
output:
col1 col2 col3
1 A1 data 2 data 6
4 A2 data 5 data 7
5 A3 data 6 data 8
6 A4 data 9 data 9
I think there may be cases where ‘Val B’ does not exist under A. so i make example and code.