Applying last valid index mask to dataframe to get last valid values

Question

I have a dataframe that looks like the following:

    s1        s2       s3       s4
0   v1        v2       v3       v4
0   v5        v6       v7       np.nan
0   v8      np.nan     v9       np.nan
0   v10     np.nan     np.nan   np.nan

Essentially from top down there are numerical values and across columns at some random index values will switch to np.nan only.

I’ve used .apply(pd.Series.last_valid_index) to get the indexes for which the values are still numerical, however, I’m not sure of the most efficient way to retrieve a series for which I have the actual value at the last valid index.

Ideally I’d be able to derive a series that looks like:

   value
s1 v10
s2 v6
s3 v9
s4 v4

or as a dataframe that looks like

   s1 s2 s3 s4
0 v10 v6 v9 v4

Many thanks!

Asked By: wingsoficarus116

||

Source

Answer 1

This is one way using NumPy indexing:

# ensure index is normalised
df = df.reset_index(drop=True)

# calculate last valid index across dataframe
idx = df.apply(pd.Series.last_valid_index)

# create result using NumPy indexing
res = pd.Series(df.values[idx, np.arange(df.shape[1])],
                index=df.columns,
                name='value')

print(res)

s1    v10
s2     v6
s3     v9
s4     v4
Name: value, dtype: object

Answered By: jpp

Answer 2

You need to normalize the index, find the last valid index per column and create a dataframe from it.

df = df.reset_index(drop=True)
ser = df.apply(lambda x: x.last_valid_index())
pd.DataFrame([df[col][ser[col]] for col in df.columns], index=df.columns).T

Output:

     s1 s2  s3  s4
0   v10 v6  v9  v4

Also, this way, your original df stays intact.

Answered By: harvpan

Answer 3

Here is another way to do it, without resetting the index:

df.apply(lambda x: x[x.notnull()].values[-1])

s1    v10
s2     v6
s3     v9
s4     v4

Answered By: sacuL

Answer 4

Here is a way using groupby()

df.stack().groupby(level=1).last()

Output:

s1    v10
s2     v6
s3     v9
s4     v4

and as a df:

df.stack().groupby(level=1).last().to_frame().T

Output:

    s1  s2  s3  s4
0  v10  v6  v9  v4

Answered By: rhug123

Applying last valid index mask to dataframe to get last valid values

Question:

Answers: