Applying last valid index mask to dataframe to get last valid values
Question:
I have a dataframe that looks like the following:
s1 s2 s3 s4
0 v1 v2 v3 v4
0 v5 v6 v7 np.nan
0 v8 np.nan v9 np.nan
0 v10 np.nan np.nan np.nan
Essentially from top down there are numerical values and across columns at some random index values will switch to np.nan only.
I’ve used .apply(pd.Series.last_valid_index) to get the indexes for which the values are still numerical, however, I’m not sure of the most efficient way to retrieve a series for which I have the actual value at the last valid index.
Ideally I’d be able to derive a series that looks like:
value
s1 v10
s2 v6
s3 v9
s4 v4
or as a dataframe that looks like
s1 s2 s3 s4
0 v10 v6 v9 v4
Many thanks!
Answers:
This is one way using NumPy indexing:
# ensure index is normalised
df = df.reset_index(drop=True)
# calculate last valid index across dataframe
idx = df.apply(pd.Series.last_valid_index)
# create result using NumPy indexing
res = pd.Series(df.values[idx, np.arange(df.shape[1])],
index=df.columns,
name='value')
print(res)
s1 v10
s2 v6
s3 v9
s4 v4
Name: value, dtype: object
You need to normalize the index, find the last valid index per column and create a dataframe from it.
df = df.reset_index(drop=True)
ser = df.apply(lambda x: x.last_valid_index())
pd.DataFrame([df[col][ser[col]] for col in df.columns], index=df.columns).T
Output:
s1 s2 s3 s4
0 v10 v6 v9 v4
Also, this way, your original df
stays intact.
Here is another way to do it, without resetting the index:
df.apply(lambda x: x[x.notnull()].values[-1])
s1 v10
s2 v6
s3 v9
s4 v4
Here is a way using groupby()
df.stack().groupby(level=1).last()
Output:
s1 v10
s2 v6
s3 v9
s4 v4
and as a df:
df.stack().groupby(level=1).last().to_frame().T
Output:
s1 s2 s3 s4
0 v10 v6 v9 v4
I have a dataframe that looks like the following:
s1 s2 s3 s4
0 v1 v2 v3 v4
0 v5 v6 v7 np.nan
0 v8 np.nan v9 np.nan
0 v10 np.nan np.nan np.nan
Essentially from top down there are numerical values and across columns at some random index values will switch to np.nan only.
I’ve used .apply(pd.Series.last_valid_index) to get the indexes for which the values are still numerical, however, I’m not sure of the most efficient way to retrieve a series for which I have the actual value at the last valid index.
Ideally I’d be able to derive a series that looks like:
value
s1 v10
s2 v6
s3 v9
s4 v4
or as a dataframe that looks like
s1 s2 s3 s4
0 v10 v6 v9 v4
Many thanks!
This is one way using NumPy indexing:
# ensure index is normalised
df = df.reset_index(drop=True)
# calculate last valid index across dataframe
idx = df.apply(pd.Series.last_valid_index)
# create result using NumPy indexing
res = pd.Series(df.values[idx, np.arange(df.shape[1])],
index=df.columns,
name='value')
print(res)
s1 v10
s2 v6
s3 v9
s4 v4
Name: value, dtype: object
You need to normalize the index, find the last valid index per column and create a dataframe from it.
df = df.reset_index(drop=True)
ser = df.apply(lambda x: x.last_valid_index())
pd.DataFrame([df[col][ser[col]] for col in df.columns], index=df.columns).T
Output:
s1 s2 s3 s4
0 v10 v6 v9 v4
Also, this way, your original df
stays intact.
Here is another way to do it, without resetting the index:
df.apply(lambda x: x[x.notnull()].values[-1])
s1 v10
s2 v6
s3 v9
s4 v4
Here is a way using groupby()
df.stack().groupby(level=1).last()
Output:
s1 v10
s2 v6
s3 v9
s4 v4
and as a df:
df.stack().groupby(level=1).last().to_frame().T
Output:
s1 s2 s3 s4
0 v10 v6 v9 v4