Find the in between value within a dataframe
Question:
Currently I have the following dataframe:
index
value
0
1
1
-1
2
-1
3
-1
4
6
5
-1
6
-1
7
-1
8
10
All those value equal to -1 means N/A and the value should be increasing. Therefore I would like to generate another two columns that should indicate the possible min and possible max value, and the possible min and max is based on the valid value inside the value column.
The exptected output would be like this:
index
value
possible min
possible max
0
1
1
-1
1
6
2
-1
1
6
3
-1
1
6
4
6
5
-1
6
10
6
-1
6
10
7
-1
6
10
8
10
I would use the extra column to find the fillna value using my own matching logic.
Answers:
Given df
:
value
0 1
1 -1
2 -1
3 -1
4 6
5 -1
6 -1
7 -1
8 10
If something should mean NaN
, make it NaN
.
df['value'] = df['value'].replace(-1, np.nan)
Now, we can fill your desired values:
df.loc[df['value'].isna(), 'possible_min'] = df['value'].ffill()
df.loc[df['value'].isna(), 'possible_max'] = df['value'].bfill()
print(df)
Bonus, linear interpolation:
df['interpolated'] = df['value'].interpolate()
Output:
value possible_min possible_max interpolated
0 1.0 NaN NaN 1.00
1 NaN 1.0 6.0 2.25
2 NaN 1.0 6.0 3.50
3 NaN 1.0 6.0 4.75
4 6.0 NaN NaN 6.00
5 NaN 6.0 10.0 7.00
6 NaN 6.0 10.0 8.00
7 NaN 6.0 10.0 9.00
8 10.0 NaN NaN 10.00
Currently I have the following dataframe:
index | value |
---|---|
0 | 1 |
1 | -1 |
2 | -1 |
3 | -1 |
4 | 6 |
5 | -1 |
6 | -1 |
7 | -1 |
8 | 10 |
All those value equal to -1 means N/A and the value should be increasing. Therefore I would like to generate another two columns that should indicate the possible min and possible max value, and the possible min and max is based on the valid value inside the value column.
The exptected output would be like this:
index | value | possible min | possible max |
---|---|---|---|
0 | 1 | ||
1 | -1 | 1 | 6 |
2 | -1 | 1 | 6 |
3 | -1 | 1 | 6 |
4 | 6 | ||
5 | -1 | 6 | 10 |
6 | -1 | 6 | 10 |
7 | -1 | 6 | 10 |
8 | 10 |
I would use the extra column to find the fillna value using my own matching logic.
Given df
:
value
0 1
1 -1
2 -1
3 -1
4 6
5 -1
6 -1
7 -1
8 10
If something should mean NaN
, make it NaN
.
df['value'] = df['value'].replace(-1, np.nan)
Now, we can fill your desired values:
df.loc[df['value'].isna(), 'possible_min'] = df['value'].ffill()
df.loc[df['value'].isna(), 'possible_max'] = df['value'].bfill()
print(df)
Bonus, linear interpolation:
df['interpolated'] = df['value'].interpolate()
Output:
value possible_min possible_max interpolated
0 1.0 NaN NaN 1.00
1 NaN 1.0 6.0 2.25
2 NaN 1.0 6.0 3.50
3 NaN 1.0 6.0 4.75
4 6.0 NaN NaN 6.00
5 NaN 6.0 10.0 7.00
6 NaN 6.0 10.0 8.00
7 NaN 6.0 10.0 9.00
8 10.0 NaN NaN 10.00