Count by unique pair of columns in pandas
Question:
I’m trying to figure out how to count by number of rows per unique pair of columns (ip, useragent), e.g.
d = pd.DataFrame({'ip': ['192.168.0.1', '192.168.0.1', '192.168.0.1', '192.168.0.2'], 'useragent': ['a', 'a', 'b', 'b']})
ip useragent
0 192.168.0.1 a
1 192.168.0.1 a
2 192.168.0.1 b
3 192.168.0.2 b
To produce:
ip useragent
192.168.0.1 a 2
192.168.0.1 b 1
192.168.0.2 b 1
Ideas?
Answers:
If you use groupby, you will get what you want.
d.groupby(['ip', 'useragent']).size()
produces:
ip useragent
192.168.0.1 a 2
b 1
192.168.0.2 b 1
print(d.groupby(['ip', 'useragent']).size().reset_index().rename(columns={0:''}))
gives:
ip useragent
0 192.168.0.1 a 2
1 192.168.0.1 b 1
2 192.168.0.2 b 1
Another nice option might be pandas.crosstab:
print(pd.crosstab(d.ip, d.useragent) )
print('nsome cosmetics:')
print(pd.crosstab(d.ip, d.useragent).reset_index().rename_axis('',axis='columns') )
gives:
useragent a b
ip
192.168.0.1 2 1
192.168.0.2 0 1
some cosmetics:
ip a b
0 192.168.0.1 2 1
1 192.168.0.2 0 1
I’m trying to figure out how to count by number of rows per unique pair of columns (ip, useragent), e.g.
d = pd.DataFrame({'ip': ['192.168.0.1', '192.168.0.1', '192.168.0.1', '192.168.0.2'], 'useragent': ['a', 'a', 'b', 'b']})
ip useragent
0 192.168.0.1 a
1 192.168.0.1 a
2 192.168.0.1 b
3 192.168.0.2 b
To produce:
ip useragent
192.168.0.1 a 2
192.168.0.1 b 1
192.168.0.2 b 1
Ideas?
If you use groupby, you will get what you want.
d.groupby(['ip', 'useragent']).size()
produces:
ip useragent
192.168.0.1 a 2
b 1
192.168.0.2 b 1
print(d.groupby(['ip', 'useragent']).size().reset_index().rename(columns={0:''}))
gives:
ip useragent
0 192.168.0.1 a 2
1 192.168.0.1 b 1
2 192.168.0.2 b 1
Another nice option might be pandas.crosstab:
print(pd.crosstab(d.ip, d.useragent) )
print('nsome cosmetics:')
print(pd.crosstab(d.ip, d.useragent).reset_index().rename_axis('',axis='columns') )
gives:
useragent a b
ip
192.168.0.1 2 1
192.168.0.2 0 1
some cosmetics:
ip a b
0 192.168.0.1 2 1
1 192.168.0.2 0 1