forward fill specific columns in pandas dataframe
Question:
If I have a dataframe with multiple columns ['x', 'y', 'z']
, how do I forward fill only one column 'x'
? Or a group of columns ['x','y']
?
I only know how to do it by axis.
Answers:
for col in ['X', 'Y']:
df[col] = df[col].ffill()
tl;dr:
cols = ['X', 'Y']
df.loc[:,cols] = df.loc[:,cols].ffill()
And I have also added a self containing example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> ## create dataframe
... ts1 = [0, 1, np.nan, np.nan, np.nan, np.nan]
>>> ts2 = [0, 2, np.nan, 3, np.nan, np.nan]
>>> d = {'X': ts1, 'Y': ts2, 'Z': ts2}
>>> df = pd.DataFrame(data=d)
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 NaN NaN NaN
3 NaN 3 3
4 NaN NaN NaN
>>>
>>> ## apply forward fill
... cols = ['X', 'Y']
>>> df.loc[:,cols] = df.loc[:,cols].ffill()
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 1 2 NaN
3 1 3 3
4 1 3 NaN
I used below code, Here for X and Y method can be different also instead of ffill().
df1 = df.fillna({
'X' : df['X'].ffill(),
'Y' : df['Y'].ffill(),
})
Two columns can be ffill()
simultaneously as given below:
df1 = df[['X','Y']].ffill()
The simplest version I think.
cols = ['X', 'Y']
df[cols] = df[cols].ffill()
Alternatively with the inplace
parameter:
df['X'].ffill(inplace=True)
df['Y'].ffill(inplace=True)
And no, you cannot do df[['X','Y]].ffill(inplace=True)
as this first creates a slice through the column selection and hence inplace forward fill would create a SettingWithCopyWarning. Of course if you have a list of columns you can do this in a loop:
for col in ['X', 'Y']:
df[col].ffill(inplace=True)
The point of using inplace
is that it avoids copying the column.
If I have a dataframe with multiple columns ['x', 'y', 'z']
, how do I forward fill only one column 'x'
? Or a group of columns ['x','y']
?
I only know how to do it by axis.
for col in ['X', 'Y']:
df[col] = df[col].ffill()
tl;dr:
cols = ['X', 'Y']
df.loc[:,cols] = df.loc[:,cols].ffill()
And I have also added a self containing example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> ## create dataframe
... ts1 = [0, 1, np.nan, np.nan, np.nan, np.nan]
>>> ts2 = [0, 2, np.nan, 3, np.nan, np.nan]
>>> d = {'X': ts1, 'Y': ts2, 'Z': ts2}
>>> df = pd.DataFrame(data=d)
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 NaN NaN NaN
3 NaN 3 3
4 NaN NaN NaN
>>>
>>> ## apply forward fill
... cols = ['X', 'Y']
>>> df.loc[:,cols] = df.loc[:,cols].ffill()
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 1 2 NaN
3 1 3 3
4 1 3 NaN
I used below code, Here for X and Y method can be different also instead of ffill().
df1 = df.fillna({
'X' : df['X'].ffill(),
'Y' : df['Y'].ffill(),
})
Two columns can be ffill()
simultaneously as given below:
df1 = df[['X','Y']].ffill()
The simplest version I think.
cols = ['X', 'Y']
df[cols] = df[cols].ffill()
Alternatively with the inplace
parameter:
df['X'].ffill(inplace=True)
df['Y'].ffill(inplace=True)
And no, you cannot do df[['X','Y]].ffill(inplace=True)
as this first creates a slice through the column selection and hence inplace forward fill would create a SettingWithCopyWarning. Of course if you have a list of columns you can do this in a loop:
for col in ['X', 'Y']:
df[col].ffill(inplace=True)
The point of using inplace
is that it avoids copying the column.