How do i 'ungroup' a grouped data frame?

Question:

I have a large dataframe containing ∼ 100 million observations of stock returns for a given date and an identifier for each specific stock. My problem is that for some stocks at some dates there are missing values in the return column. To set up a small example the data frame looks something like this:

ID Date Return
10001 1970-01-01 0.043
10001 1970-01-02 0.023
10001 1970-01-03 -0.010
10002 1970-01-01 0.010
10002 1970-01-02 NaN
10002 1970-01-03 -0.034
10003 1970-01-01 0,040
10003 1970-01-02 -0.041
10003 1970-01-03 0.009

I wish to remove all the stocks containing one or more missing values in the return column, which I do by:

df = df[df['RET'].notna().groupby('ID').transform('all')]

From the data frame example above, this will generate a grouped data frame without stocks having missing values in the return column:

ID Date Return
10001 1970-01-01 0.043
1970-01-02 0.023
1970-01-03 -0.010
10003 1970-01-01 0,040
1970-01-02 -0.041
1970-01-03 0.009

However, I need to convert the data frame back to being ungrouped again. Do any of you have a solution on how to do this? To specify, I want the data frame to look like this:

ID Date Return
10001 1970-01-01 0.043
10001 1970-01-02 0.023
10001 1970-01-03 -0.010
10003 1970-01-01 0,040
10003 1970-01-02 -0.041
10003 1970-01-03 0.009

Thank you in advance for any possible solution.

I have already tried using .unstack() and .reset_index() without any luck

Asked By: Sebastian

||

Answers:

You don’t need grouping. Simple boolean indexing + .isin() is enough:

x = df.loc[df['Return'].isna(), 'ID'].unique()
print(df[~df['ID'].isin(x)])

Prints:

      ID        Date  Return
0  10001  1970-01-01   0.043
1  10001  1970-01-02   0.023
2  10001  1970-01-03  -0.010
6  10003  1970-01-01   0,040
7  10003  1970-01-02  -0.041
8  10003  1970-01-03   0.009
Answered By: Andrej Kesely

If anyone should wonder, this could also be fixed by adding ‘, group_keys=False’ to the groupby function:

df = df[df['RET'].notna().groupby('ID', group_keys=False).transform('all')]
Answered By: Sebastian
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.