How to iterate over Pandas Series generated from groupby().size()

Question:

How do you iterate over a Pandas Series generated from a .groupby('...').size() command and get both the group name and count.

As an example if I have:

foo
-1     7
 0    85
 1    14
 2     5

how can I loop over them so that in each iteration I would have -1 & 7, 0 & 85, 1 & 14 and 2 & 5 in variables?

I tried the enumerate option but it doesn’t quite work. Example:

for i, row in enumerate(df.groupby(['foo']).size()):
    print(i, row)

it doesn’t return -1, 0, 1, and 2 for i but rather 0, 1, 2, 3.

Asked By: Reily Bourne

||

Answers:

Update:

Given a pandas Series:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

s
#a    1
#b    2
#c    3
#d    4
#dtype: int64

You can directly loop through it, which yield one value from the series in each iteration:

for i in s:
    print(i)
1
2
3
4

If you want to access the index at the same time, you can use either items or iteritems method, which produces a generator that contains both the index and value:

for i, v in s.items():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

for i, v in s.iteritems():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

Old Answer:

You can call iteritems() method on the Series:

for i, row in df.groupby('a').size().iteritems():
    print(i, row)

# 12 4
# 14 2

According to doc:

Series.iteritems()

Lazily iterate over (index, value) tuples

Note: This is not the same data as in the question, just a demo.

Answered By: Psidom

To expand upon the answer of Psidom, there are three useful ways to unpack data from pd.Series. Having the same Series as Psidom:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

  • A direct loop over s yields the value of each row.
  • A loop over
    s.iteritems() or s.items() yields a tuple with the (index,value)
    pairs of each row.
  • Using enumerate() on s.iteritems() yields a
    nested tuple in the form of: (rownum,(index,value)).

The last way is useful in case your index contains other information than the row number itself (e.g. in a case of a timeseries where the index is time).

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

for rownum,(indx,val) in enumerate(s.iteritems()):
    print('row number: ', rownum, 'index: ', indx, 'value: ', val)

will output:

row number:  0 index:  a value:  1
row number:  1 index:  b value:  2
row number:  2 index:  c value:  3
row number:  3 index:  d value:  4

You can read more on unpacking nested tuples here.

Answered By: dbouz
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.