Group Dataframe using values of a column to group rows

Question:

So I have a Dataset of Patients that would look like this (I’m using toy values here for simplicity).

enter image description here

I wanted to sort the data frame by the discharge values and then group it according to Patients. Everything else in my preprocessing happens perfectly, and my it groups the way it is supposed to (gives me clusters of each patient, sorted (descending) by the value of P). However, my code returns a groupedby object and I need a data frame to do analysis on. How can I convert it? I’ve tried a few online samples, but none of them work. Is there a better way?

The End Result would be this. The ordering of the clusters doesn’t matter (PT 1 can be after 7 etc etc.)

enter image description here

def preProcessing(df):
    """
    The preprocessing done on the Dataset for efficient use.
    @returns- The new and improved dataset
    We're using the following steps- 
    Sort Data by P
    Group by PID
    
    """
    dateTimeCols=['P'] 
    df= df.sort_values(by=dateTimeCols, ascending=False) 
    df= df.groupby('PT_ID')
    print(df)
    # for name,group in df:
        # print (name)
        # print (group)
    return df
Asked By: pasha

||

Answers:

I wanted to sort the data frame by the discharge values and then group it according to Patients.

It doesn’t sound like grouping will solve your problem. Usually grouping is used to get an average of each group, or reduce the amount of data. I think what you should do is sorting.

You could use sort_values() to accomplish both steps. This line of code both puts each patient’s records together, and sorts that group of records by discharge.

df = df.sort_values(by=['PT_ID', 'P'], ascending=[True, False])

Here is the full code, including the data I tested it on:

import pandas as pd
df = pd.DataFrame({
    'ENC_ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6},
    'PT_ID': {0: 1, 1: 1, 2: 7, 3: 2, 4: 2, 5: 1},
    'P': {0: 2, 1: 3, 2: 1, 3: 2, 4: 4, 5: 1}
})
df = df.sort_values(by=['PT_ID', 'P'], ascending=[True, False])

Some notes:

  1. When given multiple keys to sort by, sort_values() will sort by the first one, then if there are any ties, it will sort by the second one, and so on.
  2. I’m sorting PT_ID in ascending order, but P in descending order. That’s the purpose of ascending=[True, False].
Answered By: Nick ODell