Pandas groupby throws: TypeError: unhashable type: 'numpy.ndarray'
Question:
I have a dataframe as shown in the picture:
I would like to group the data by Source class and Destination class, count the number of rows in each group and sum up Attention values.
While trying to achieve that, I am unable to get past this type error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-100-6f2c8b3de8f2> in <module>()
----> 1 attdf.groupby(['Source Class', 'Destination Class']).count()
8 frames
pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()
/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value)
458 table = hash_klass(size_hint or len(values))
459 uniques, labels = table.factorize(values, na_sentinel=na_sentinel,
--> 460 na_value=na_value)
461
462 labels = ensure_platform_int(labels)
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
TypeError: unhashable type: 'numpy.ndarray'
attdf.groupby(['Source Class', 'Destination Class'])
gives me a <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f1e720f2080>
which I’m not sure how to use to get what I want.
Dataframe attdf can be imported from : https://drive.google.com/open?id=1t_h4b8FQd9soVgYeiXQasY-EbnhfOEYi
Please advise.
Answers:
try using .agg
as follows:
import pandas as pd
attdf = pd.read_csv("attdf.csv")
print(attdf.groupby(['Source Class', 'Destination Class']).agg({"Attention": ['sum', 'count']}))
Output:
Attention
sum count
Source Class Destination Class
0 0 282.368908 1419
1 7.251101 32
2 3.361009 23
3 22.482438 161
4 14.020189 88
5 10.138409 75
6 11.377947 80
1 0 6.172269 32
1 181.582437 1035
2 9.440956 62
3 12.007303 67
4 3.025752 20
5 4.491725 28
6 0.279559 2
2 0 3.349921 23
1 8.521828 62
2 391.116034 2072
3 9.937170 53
4 0.412747 2
5 4.441985 30
6 0.220316 2
3 0 33.156251 161
1 11.944373 67
2 9.176584 53
3 722.685180 3168
4 29.776050 137
5 8.827215 54
6 2.434347 16
4 0 17.431855 88
1 4.195519 20
2 0.457089 2
3 20.401789 137
4 378.802604 1746
5 3.616083 19
6 1.095061 6
5 0 13.525333 75
1 4.289306 28
2 6.424412 30
3 10.911705 54
4 3.896328 19
5 250.309764 1132
6 8.643153 46
6 0 15.249959 80
1 0.150240 2
2 0.413639 2
3 3.108417 16
4 0.850280 6
5 8.655959 46
6 151.571505 686
@Adam.Er8 and @jezarael helped me with their inputs. The unhashable type error in my case was because of the datatypes of the columns in my dataframe.
Original df and df imported from csv
It turned out that the original dataframe had two object columns which i was trying to use up in the groupby. Hence the unhashable type error. But on importing the data into a new dataframe right out of a csv fixed the datatypes. Consequently, no type errors faced anymore.
I had luck with the first answer. I was combining latitude and longitude into a tuple instead of using them independently which allowed me to arrive at the same thing if I did that instead of using the tuple-based approach.
I have a dataframe as shown in the picture:
I would like to group the data by Source class and Destination class, count the number of rows in each group and sum up Attention values.
While trying to achieve that, I am unable to get past this type error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-100-6f2c8b3de8f2> in <module>()
----> 1 attdf.groupby(['Source Class', 'Destination Class']).count()
8 frames
pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()
/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value)
458 table = hash_klass(size_hint or len(values))
459 uniques, labels = table.factorize(values, na_sentinel=na_sentinel,
--> 460 na_value=na_value)
461
462 labels = ensure_platform_int(labels)
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
TypeError: unhashable type: 'numpy.ndarray'
attdf.groupby(['Source Class', 'Destination Class'])
gives me a <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f1e720f2080>
which I’m not sure how to use to get what I want.
Dataframe attdf can be imported from : https://drive.google.com/open?id=1t_h4b8FQd9soVgYeiXQasY-EbnhfOEYi
Please advise.
try using .agg
as follows:
import pandas as pd
attdf = pd.read_csv("attdf.csv")
print(attdf.groupby(['Source Class', 'Destination Class']).agg({"Attention": ['sum', 'count']}))
Output:
Attention
sum count
Source Class Destination Class
0 0 282.368908 1419
1 7.251101 32
2 3.361009 23
3 22.482438 161
4 14.020189 88
5 10.138409 75
6 11.377947 80
1 0 6.172269 32
1 181.582437 1035
2 9.440956 62
3 12.007303 67
4 3.025752 20
5 4.491725 28
6 0.279559 2
2 0 3.349921 23
1 8.521828 62
2 391.116034 2072
3 9.937170 53
4 0.412747 2
5 4.441985 30
6 0.220316 2
3 0 33.156251 161
1 11.944373 67
2 9.176584 53
3 722.685180 3168
4 29.776050 137
5 8.827215 54
6 2.434347 16
4 0 17.431855 88
1 4.195519 20
2 0.457089 2
3 20.401789 137
4 378.802604 1746
5 3.616083 19
6 1.095061 6
5 0 13.525333 75
1 4.289306 28
2 6.424412 30
3 10.911705 54
4 3.896328 19
5 250.309764 1132
6 8.643153 46
6 0 15.249959 80
1 0.150240 2
2 0.413639 2
3 3.108417 16
4 0.850280 6
5 8.655959 46
6 151.571505 686
@Adam.Er8 and @jezarael helped me with their inputs. The unhashable type error in my case was because of the datatypes of the columns in my dataframe.
Original df and df imported from csv
It turned out that the original dataframe had two object columns which i was trying to use up in the groupby. Hence the unhashable type error. But on importing the data into a new dataframe right out of a csv fixed the datatypes. Consequently, no type errors faced anymore.
I had luck with the first answer. I was combining latitude and longitude into a tuple instead of using them independently which allowed me to arrive at the same thing if I did that instead of using the tuple-based approach.