Pandas multi-index data access

Question:

I have a multi-index dataframe like this:

       TC_name  Year
id id2              
1  1      RITA  2020
   2      RITA  2020
2  1       IDA  2020
   2       IDA  2020
   3       IDA  2020
   4       IDA  2021
3  1      RITA  2021
   2      RITA  2021
   3      RITA  2021

Now, I want to access the first line for each ‘id’ group, i.e. (1,1) = RITA2020, (2,1) = IDA2020, (3,1) = RITA2021…and use them to form a new dataframe.

However, when I try df.loc[:,1], it does not work. I tried df.loc[1], df.loc[2] and it gives me the right group, but it seems that the ‘id2’ index can not work well.

So what should I do next to get access to the data I want?

Thank you for your help.

Asked By: Feng Hu

||

Answers:

Assuming OP wants to create a dataframe based on the first element of each group, one can use pandas.DataFrame.groupby. As OP wants the first index, id, one should be level=0. Finally, considering that OP wants the first element for each group, then one needs to pass .first()

df2 = df.groupby(level=0).first()

[Out]:
   TC_name  Year
id              
1     RITA  2020
2      IDA  2020
3     RITA  2021
Answered By: Gonçalo Peres