pandas – DataFrame.groupby.head with different values

Question:

I have two dataframes. One of them has session ids and their cut-off points. The other dataframe has multiple rows for each session and I want to take first n rows of each session and n is the cut-off point from the other dataframe. This is a screenshot of two dataframes.

enter image description here

For example session 0 has 20 rows and session 1 has 50 rows. Cut-off index for session 0 is 10 and it is 30 for session 1. I want to do a groupby or any vectorized operation which takes first 10 rows of session 0 and first 30 rows of session 1.

Is it possible without looping?

Asked By: gunesevitan

||

Answers:

An example:

import numpy as np
import pandas as pd

# Sample data:
df = pd.DataFrame({
    "session": np.repeat(np.arange(5), 4),
    "data": np.arange(20)
})

# Define the cutoffs for each session:
cutoffs = [3, 2, 4, 2, 1]
# Or use a dict: session -> cutoff

out = df.groupby("session").apply(lambda x: x.head(cutoffs[x.name]))
# x.name is the current session of whatever group is being worked on

out:

            session  data
session
0       0         0     0
        1         0     1
        2         0     2
1       4         1     4
        5         1     5
2       8         2     8
        9         2     9
        10        2    10
        11        2    11
3       12        3    12
        13        3    13
4       16        4    16

The second level of the index is the original index; you can optionally drop it using .droplevel(1)

Answered By: Chrysophylaxs
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.