is .agg() element-wise on a series?
Question:
If we were to take the following series:
s = pd.Series([20, 21, 12],
index=['London', 'New York', 'Helsinki'])
London 20
New York 21
Helsinki 12
This makes me believe it operates element-wise on a series:
s.agg('{}_Test'.format)
London 20_Test
New York 21_Test
Helsinki 12_Test
But it can also produce scalar results on the series as well, which is well documented.
s.agg(np.sum)
53
I believe .agg()
is supposed to work very similar to apply, but can receive multiple functions at once.
If we did s.apply(np.sum)
, since it is element-wise wouldn’t do anything.
My question is, is .agg()
element-wise when not aggregating, and when doing an aggregation, not element-wise?
(in s.agg(lambda x: x)
x would be a series I believe, but not always.)
Answers:
Whether the Series.agg()
method works element-wise depends on the function you pass to it. According to the pandas documentation, this function
must either work when passed a Series or when passed to Series.apply.
So when you pass a function that takes a Series as an argument (e.g. np.sum()
), then Series.agg()
will not work element-wise, but do a proper aggregation by applying the passed function to the whole Series at once.
When you pass a function that does not take a Series as an argument, then Series.agg()
will work element-wise, by passing the function to Series.apply()
. So in this case no aggregation is happening.
Whether the Series.agg() method works element-wise depends on the
function you pass to it. According to the pandas documentation
Actually it also depends on the content of Series!
ser_n = pd.Series([111.0, 222.0, 333.0])
ser_s = pd.Series(['abc', 'qrt', 'xyz'])
max(ser_n) # 333.0
max(ser_s) # xyz
ser_n.agg(max) # 333.0
ser_s.agg(max) # xyz
But if you pass your own function .agg() behavior differs when dtype is string:
ser_n.agg(lambda s: max(s)) # 333.0
ser_s.agg(lambda s: max(s)) # Series: [0]:'c' [1]:'t' [2]:'z'
Also:
','.join(ser_s) # 'abc,qrt,xyz'
ser_s.agg(','.join) # Series: [0]:'a,b,c' [1]:'q,r,t' [2]:'x,y,z'
Look closely what is happened inside the calls:
def func(s):
print('---call:---')
print(s, type(s))
return max(s)
print('---result:---n', ser_n.agg(func), 'n')
print('---result:---n', ser_s.agg(func), 'n')
---call:---
111.0 <class 'float'>
---call:---
0 111.0
1 222.0
2 333.0
dtype: float64 <class 'pandas.core.series.Series'>
---result:---
333.0
---call:---
abc <class 'str'>
---call:---
qrt <class 'str'>
---call:---
xyz <class 'str'>
---result:---
0 c
1 t
2 z
dtype: object
It’s like the first call tests type of data. If it’s numerical – second call accept the whole Series. If it’s a string – then continues element wise.
Very strange an undocumented behavior!
In pandas at least 0.25.3 method aggregate()
behaves element-wise or on-the-whole depending at possibilty to apply element-wise without raising exceptions based on real runtime data!
It seems, there is no explanation of this wondering behaviour in the module documentation also.
There is code snippet to bypass that behavior below. Just for example, I show the case when we need to count all elements including nulls (and other N/A), i.e. get length of the series. Instead of obvious, but not properly working lambda series: len(series)
we may use
def agg_len(series: pd.Series) -> int:
if not isinstance(series, pd.Series):
raise TypeError('Element-wise behavior is not supposed')
return len(series)
series.aggregate(agg_len)
Btw, simple lambda series: len(series)
may also work for you, but only if you don’t use string series, or other lengthed types 🙂
If we were to take the following series:
s = pd.Series([20, 21, 12],
index=['London', 'New York', 'Helsinki'])
London 20
New York 21
Helsinki 12
This makes me believe it operates element-wise on a series:
s.agg('{}_Test'.format)
London 20_Test
New York 21_Test
Helsinki 12_Test
But it can also produce scalar results on the series as well, which is well documented.
s.agg(np.sum)
53
I believe .agg()
is supposed to work very similar to apply, but can receive multiple functions at once.
If we did s.apply(np.sum)
, since it is element-wise wouldn’t do anything.
My question is, is .agg()
element-wise when not aggregating, and when doing an aggregation, not element-wise?
(in s.agg(lambda x: x)
x would be a series I believe, but not always.)
Whether the Series.agg()
method works element-wise depends on the function you pass to it. According to the pandas documentation, this function
must either work when passed a Series or when passed to Series.apply.
So when you pass a function that takes a Series as an argument (e.g. np.sum()
), then Series.agg()
will not work element-wise, but do a proper aggregation by applying the passed function to the whole Series at once.
When you pass a function that does not take a Series as an argument, then Series.agg()
will work element-wise, by passing the function to Series.apply()
. So in this case no aggregation is happening.
Whether the Series.agg() method works element-wise depends on the
function you pass to it. According to the pandas documentation
Actually it also depends on the content of Series!
ser_n = pd.Series([111.0, 222.0, 333.0])
ser_s = pd.Series(['abc', 'qrt', 'xyz'])
max(ser_n) # 333.0
max(ser_s) # xyz
ser_n.agg(max) # 333.0
ser_s.agg(max) # xyz
But if you pass your own function .agg() behavior differs when dtype is string:
ser_n.agg(lambda s: max(s)) # 333.0
ser_s.agg(lambda s: max(s)) # Series: [0]:'c' [1]:'t' [2]:'z'
Also:
','.join(ser_s) # 'abc,qrt,xyz'
ser_s.agg(','.join) # Series: [0]:'a,b,c' [1]:'q,r,t' [2]:'x,y,z'
Look closely what is happened inside the calls:
def func(s):
print('---call:---')
print(s, type(s))
return max(s)
print('---result:---n', ser_n.agg(func), 'n')
print('---result:---n', ser_s.agg(func), 'n')
---call:---
111.0 <class 'float'>
---call:---
0 111.0
1 222.0
2 333.0
dtype: float64 <class 'pandas.core.series.Series'>
---result:---
333.0
---call:---
abc <class 'str'>
---call:---
qrt <class 'str'>
---call:---
xyz <class 'str'>
---result:---
0 c
1 t
2 z
dtype: object
It’s like the first call tests type of data. If it’s numerical – second call accept the whole Series. If it’s a string – then continues element wise.
Very strange an undocumented behavior!
In pandas at least 0.25.3 method aggregate()
behaves element-wise or on-the-whole depending at possibilty to apply element-wise without raising exceptions based on real runtime data!
It seems, there is no explanation of this wondering behaviour in the module documentation also.
There is code snippet to bypass that behavior below. Just for example, I show the case when we need to count all elements including nulls (and other N/A), i.e. get length of the series. Instead of obvious, but not properly working lambda series: len(series)
we may use
def agg_len(series: pd.Series) -> int:
if not isinstance(series, pd.Series):
raise TypeError('Element-wise behavior is not supposed')
return len(series)
series.aggregate(agg_len)
Btw, simple lambda series: len(series)
may also work for you, but only if you don’t use string series, or other lengthed types 🙂