Efficiently transforming data in pandas

Question:

What would be the best way to approach this problem using pandas and python?

I currently have a pandas data-frame in a relatively awkward format, for example:

        Country      Indicator  2000  2010
0   Afghanistan            foo     1   2.5
1   Afghanistan            bar     3   4.5
2   Afghanistan            zoo     5   6.5
3       Bolivia            foo     7   8.5
4       Bolivia            bar     9  10.5
5       Bolivia            zoo    11  12.5
6      Cameroon            foo     2   1.5
7      Cameroon            bar     4   3.5
8      Cameroon            zoo     6   5.5
9       Denmark            foo     8   7.5
10      Denmark            bar    10   9.5
11      Denmark            zoo    12  11.5

Say that I want to split this into two separate data-frames for each respective year.

For 2000:

   foo bar zoo
0   1   3   5
1   7   9   11
2   2   4   6
3   8   10  12

For 2010:

    foo  bar   zoo
0   2.5  4.5   6.5
1   8.5  10.5  12.5
2   1.5  3.5   5.5
3   7.5  9.5   11.5

What might be the most efficient way to carry this out in pandas?

Thanks in advance.

Asked By: juandiegodc

||

Answers:

This transformation is called “pivoting”, or sometimes “casting” or “unmelting”. It’s so common that it’s covered by specific functions in the api.:

df_years = df.pivot(index='Country', columns='Indicator', values=['2000', '2010'])    

            2000             2010           
Indicator    bar  foo   zoo   bar  foo   zoo
Country                                     
Afghanistan  3.0  1.0   5.0   4.5  2.5   6.5
Bolivia      9.0  7.0  11.0  10.5  8.5  12.5
...

This results in multi level columns:

df_years['2000']                                                                                                                                                                                    

Indicator    bar  foo   zoo
Country                    
Afghanistan  3.0  1.0   5.0
Bolivia      9.0  7.0  11.0
...
df_years['2010']                                                                                                                                                                                   

Indicator     bar  foo   zoo
Country                     
Afghanistan   4.5  2.5   6.5
Bolivia      10.5  8.5  12.5
...

You should just work with these, but if you want flat dataframes, you can assign like this:

df_2000 = df_years['2000']

Indicator    bar  foo   zoo
Country                    
Afghanistan  3.0  1.0   5.0
Bolivia      9.0  7.0  11.0
...
Answered By: Dave

Using pivot-table allows you to save a sequence of labels: foo, bar, zoo.
For 2000 year:

df2000 = df.pivot_table(index='Country', columns='Indicator', sort=False)['2000']

print(df2000)

Indicator    foo  bar  zoo
Country                   
Afghanistan    1    3    5
Bolivia        7    9   11
Cameroon       2    4    6
Denmark        8   10   12
Answered By: Ensay
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.