turning a collections counter into dictionary
Question:
I have a collection outcome resulting from the function:
Counter(df.email_address)
it returns each individual email address with the count of its repetitions.
Counter({nan: 1618, '[email protected]': 265, '[email protected]': 1})
what I want to do is to use it as if it was a dictionary and create a pandas dataframe out of it with two columns one for email addresses and one for the value associated.
I tried with:
dfr = repeaters.from_dict(repeaters, orient='index')
but i got the following error:
AttributeError: 'Counter' object has no attribute 'from_dict'
It makes thing that Counter is not a dictionary as it looks like. Any idea on how to append it to a df?
Answers:
d = {}
cnt = Counter(df.email_address)
for key, value in cnt.items():
d[key] = value
EDIT
Or, how @Trif Nefzger suggested:
d = dict(Counter(df.email_address))
Alternatively you could use pd.Series.value_counts
, which returns a Series
object.
df.email_address.value_counts(dropna=False)
Sample output:
[email protected] 2
[email protected] 1
NaN 1
dtype: int64
This is not exactly what you asked for but looks like what you’d like to achieve.
as ajcr wrote at the comment, from_dict
is a method that belongs to dataframe and thus you can write the following to achieve your goal:
from collections import Counter
import pandas as pd
repeaters = Counter({"nan": 1618, '[email protected]': 265, '[email protected]': 1})
dfr = pd.DataFrame.from_dict(repeaters, orient='index')
print dfr
Output:
[email protected] 1
nan 1618
[email protected] 265
Not sure why there are many convoluted ways.
Counter
is a dict
subclass. So you can pass to anything that expects a param of type dict
.
class Counter(dict):
'''Dict subclass for counting hashable items...
- If you really really want to convert
Counter
to a dict
:
>>> d1 = dict(cntr)
>>> d1
{nan: 1618, '[email protected]': 265, '[email protected]': 1}
>>>
>>>
>>> d2 = {k: v for k, v in cntr.items()}
>>> d2
{nan: 1618, '[email protected]': 265, '[email protected]': 1}
>>>
- To create a Pandas
DataFrame
from Counter
use pandas.DataFrame.from_dict()
. It takes a dict
, but a dict of either:
{'col_name1': [r1c1, r2c1...], 'col_name2': [r1c2, r2c2,...], ...
OR
{'row_id1': [r1c1, r1c2,...], 'row_id2': [r2c1, r2c2,...], ...
where rNcM
is the value Nth
row and Mth
column.
>>> from collections import Counter
>>> cntr = Counter({float('nan'): 1618, '[email protected]': 265, '[email protected]': 1})
>>> cntr
Counter({nan: 1618, '[email protected]': 265, '[email protected]': 1})
>>>
>>> import panadas as pd
>>> pdf = pd.DataFrame.from_dict({'emails': cntr.keys(), 'repeatation_count': cntr.values()})
>>> print(pdf.to_string())
emails repeatation_count
0 NaN 1618
1 [email protected] 265
2 [email protected] 1
>>>
I have a collection outcome resulting from the function:
Counter(df.email_address)
it returns each individual email address with the count of its repetitions.
Counter({nan: 1618, '[email protected]': 265, '[email protected]': 1})
what I want to do is to use it as if it was a dictionary and create a pandas dataframe out of it with two columns one for email addresses and one for the value associated.
I tried with:
dfr = repeaters.from_dict(repeaters, orient='index')
but i got the following error:
AttributeError: 'Counter' object has no attribute 'from_dict'
It makes thing that Counter is not a dictionary as it looks like. Any idea on how to append it to a df?
d = {}
cnt = Counter(df.email_address)
for key, value in cnt.items():
d[key] = value
EDIT
Or, how @Trif Nefzger suggested:
d = dict(Counter(df.email_address))
Alternatively you could use pd.Series.value_counts
, which returns a Series
object.
df.email_address.value_counts(dropna=False)
Sample output:
[email protected] 2
[email protected] 1
NaN 1
dtype: int64
This is not exactly what you asked for but looks like what you’d like to achieve.
as ajcr wrote at the comment, from_dict
is a method that belongs to dataframe and thus you can write the following to achieve your goal:
from collections import Counter
import pandas as pd
repeaters = Counter({"nan": 1618, '[email protected]': 265, '[email protected]': 1})
dfr = pd.DataFrame.from_dict(repeaters, orient='index')
print dfr
Output:
[email protected] 1
nan 1618
[email protected] 265
Not sure why there are many convoluted ways.
Counter
is adict
subclass. So you can pass to anything that expects a param of typedict
.
class Counter(dict):
'''Dict subclass for counting hashable items...
- If you really really want to convert
Counter
to adict
:
>>> d1 = dict(cntr)
>>> d1
{nan: 1618, '[email protected]': 265, '[email protected]': 1}
>>>
>>>
>>> d2 = {k: v for k, v in cntr.items()}
>>> d2
{nan: 1618, '[email protected]': 265, '[email protected]': 1}
>>>
- To create a Pandas
DataFrame
fromCounter
usepandas.DataFrame.from_dict()
. It takes adict
, but a dict of either:{'col_name1': [r1c1, r2c1...], 'col_name2': [r1c2, r2c2,...], ...
OR{'row_id1': [r1c1, r1c2,...], 'row_id2': [r2c1, r2c2,...], ...
where rNcM
is the value Nth
row and Mth
column.
>>> from collections import Counter
>>> cntr = Counter({float('nan'): 1618, '[email protected]': 265, '[email protected]': 1})
>>> cntr
Counter({nan: 1618, '[email protected]': 265, '[email protected]': 1})
>>>
>>> import panadas as pd
>>> pdf = pd.DataFrame.from_dict({'emails': cntr.keys(), 'repeatation_count': cntr.values()})
>>> print(pdf.to_string())
emails repeatation_count
0 NaN 1618
1 [email protected] 265
2 [email protected] 1
>>>