is .agg() element-wise on a series?

Question:

If we were to take the following series:

s = pd.Series([20, 21, 12],
              index=['London', 'New York', 'Helsinki'])

London      20
New York    21
Helsinki    12

This makes me believe it operates element-wise on a series:

s.agg('{}_Test'.format)

London      20_Test
New York    21_Test
Helsinki    12_Test

But it can also produce scalar results on the series as well, which is well documented.

s.agg(np.sum)

53

I believe .agg() is supposed to work very similar to apply, but can receive multiple functions at once.

If we did s.apply(np.sum), since it is element-wise wouldn’t do anything.

My question is, is .agg() element-wise when not aggregating, and when doing an aggregation, not element-wise?

(in s.agg(lambda x: x) x would be a series I believe, but not always.)

Asked By: rhug123

||

Answers:

Whether the Series.agg() method works element-wise depends on the function you pass to it. According to the pandas documentation, this function

must either work when passed a Series or when passed to Series.apply.

So when you pass a function that takes a Series as an argument (e.g. np.sum()), then Series.agg() will not work element-wise, but do a proper aggregation by applying the passed function to the whole Series at once.

When you pass a function that does not take a Series as an argument, then Series.agg() will work element-wise, by passing the function to Series.apply(). So in this case no aggregation is happening.

Answered By: Arne

Whether the Series.agg() method works element-wise depends on the
function you pass to it. According to the pandas documentation

Actually it also depends on the content of Series!

ser_n = pd.Series([111.0, 222.0, 333.0])
ser_s = pd.Series(['abc', 'qrt', 'xyz'])
max(ser_n)      # 333.0
max(ser_s)      # xyz
ser_n.agg(max)  # 333.0
ser_s.agg(max)  # xyz

But if you pass your own function .agg() behavior differs when dtype is string:

ser_n.agg(lambda s: max(s))     # 333.0
ser_s.agg(lambda s: max(s))     # Series: [0]:'c'  [1]:'t' [2]:'z'

Also:

','.join(ser_s)         # 'abc,qrt,xyz'
ser_s.agg(','.join)     # Series: [0]:'a,b,c' [1]:'q,r,t' [2]:'x,y,z'

Look closely what is happened inside the calls:

def func(s):
    print('---call:---')
    print(s, type(s))
    return max(s)
print('---result:---n', ser_n.agg(func), 'n')
print('---result:---n', ser_s.agg(func), 'n')

---call:---
111.0 <class 'float'>
---call:---
0    111.0
1    222.0
2    333.0
dtype: float64 <class 'pandas.core.series.Series'>
---result:---
 333.0 

---call:---
abc <class 'str'>
---call:---
qrt <class 'str'>
---call:---
xyz <class 'str'>
---result:---
0    c
1    t
2    z
dtype: object

It’s like the first call tests type of data. If it’s numerical – second call accept the whole Series. If it’s a string – then continues element wise.
Very strange an undocumented behavior!

Answered By: Ybis Zedex

In pandas at least 0.25.3 method aggregate() behaves element-wise or on-the-whole depending at possibilty to apply element-wise without raising exceptions based on real runtime data!
It seems, there is no explanation of this wondering behaviour in the module documentation also.

There is code snippet to bypass that behavior below. Just for example, I show the case when we need to count all elements including nulls (and other N/A), i.e. get length of the series. Instead of obvious, but not properly working lambda series: len(series) we may use

def agg_len(series: pd.Series) -> int:
    if not isinstance(series, pd.Series):
        raise TypeError('Element-wise behavior is not supposed')
    return len(series)

series.aggregate(agg_len)

Btw, simple lambda series: len(series) may also work for you, but only if you don’t use string series, or other lengthed types 🙂

Answered By: Nikolay Prokopyev
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.