Pandas: Remove NaN only at beginning and end of dataframe
Question:
I’ve got a pandas DataFrame that looks like this:
sum
1948 NaN
1949 NaN
1950 5
1951 3
1952 NaN
1953 4
1954 8
1955 NaN
and I would like to cut off the NaN
s at the beginning and at the end ONLY (i.e. only the values incl. NaN
from 1950 to 1954 should remain).
I already tried .isnull()
and dropna()
, but somehow I couldn’t find a proper solution.
Can anyone help?
Answers:
Here is one way to do it.
import pandas as pd
# your data
# ==============================
df
sum
1948 NaN
1949 NaN
1950 5
1951 3
1952 NaN
1953 4
1954 8
1955 NaN
# processing
# ===============================
idx = df.fillna(method='ffill').dropna().index
res_idx = df.loc[idx].fillna(method='bfill').dropna().index
df.loc[res_idx]
sum
1950 5
1951 3
1952 NaN
1953 4
1954 8
Use the built in first_valid_index
and last_valid_index
they are designed specifically for this and slice your df:
In [5]:
first_idx = df.first_valid_index()
last_idx = df.last_valid_index()
print(first_idx, last_idx)
df.loc[first_idx:last_idx]
1950 1954
Out[5]:
sum
1950 5
1951 3
1952 NaN
1953 4
1954 8
Here is a an approach with Numpy
:
import numpy as np
x = np.logical_not(pd.isnull(df))
mask = np.logical_and(np.cumsum(x)!=0, np.cumsum(x[::-1])[::-1]!=0)
In [313]: df.loc[mask['sum'].tolist()]
Out[313]:
sum
1950 5
1951 3
1952 NaN
1953 4
1954 8
One-liner:
df.query('[email protected]().isna().any(axis=1)&[email protected]().isna().any(axis=1)')
I’ve got a pandas DataFrame that looks like this:
sum
1948 NaN
1949 NaN
1950 5
1951 3
1952 NaN
1953 4
1954 8
1955 NaN
and I would like to cut off the NaN
s at the beginning and at the end ONLY (i.e. only the values incl. NaN
from 1950 to 1954 should remain).
I already tried .isnull()
and dropna()
, but somehow I couldn’t find a proper solution.
Can anyone help?
Here is one way to do it.
import pandas as pd
# your data
# ==============================
df
sum
1948 NaN
1949 NaN
1950 5
1951 3
1952 NaN
1953 4
1954 8
1955 NaN
# processing
# ===============================
idx = df.fillna(method='ffill').dropna().index
res_idx = df.loc[idx].fillna(method='bfill').dropna().index
df.loc[res_idx]
sum
1950 5
1951 3
1952 NaN
1953 4
1954 8
Use the built in first_valid_index
and last_valid_index
they are designed specifically for this and slice your df:
In [5]:
first_idx = df.first_valid_index()
last_idx = df.last_valid_index()
print(first_idx, last_idx)
df.loc[first_idx:last_idx]
1950 1954
Out[5]:
sum
1950 5
1951 3
1952 NaN
1953 4
1954 8
Here is a an approach with Numpy
:
import numpy as np
x = np.logical_not(pd.isnull(df))
mask = np.logical_and(np.cumsum(x)!=0, np.cumsum(x[::-1])[::-1]!=0)
In [313]: df.loc[mask['sum'].tolist()]
Out[313]:
sum
1950 5
1951 3
1952 NaN
1953 4
1954 8
One-liner:
df.query('[email protected]().isna().any(axis=1)&[email protected]().isna().any(axis=1)')