Pandas Crosstab dos not support Float (with capital F) number formats

Question:

I am working on a sample data transaction dataframe. Such base contains cliente ID, transaction gross value (GMV) and revenue. Take this example as DF :

num_variables = 100
rng = np.random.default_rng()
df = pd.DataFrame({
    'id' :  np.random.randint(1,999999999,num_variables),
    'date' : [np.random.choice(pd.date_range(datetime(2022,6,1),datetime(2022,12,31))) for i in range(num_variables)],
    'gmv' : rng.random(num_variables) * 100,
    'revenue' : rng.random(num_variables) * 100})

I am grouping such data by client ID, crossing with transaction month and exhibiting revenue values.

clients = df[['id', 'date','revenue']].groupby(['id', df.date.dt.to_period("M")], dropna=False).aggregate({'revenue': 'sum'})
clients.reset_index(inplace=True)

Now I create a crosstab

CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)

The code above works normally as my sample dataframe revenue is a "float64" dtype.
But it a change the dtype to Float64, it does not work anymore.

num_variables = 100
rng = np.random.default_rng()
df = pd.DataFrame({
    'id' :  np.random.randint(1,999999999,num_variables),
    'date' : [np.random.choice(pd.date_range(datetime(2022,6,1),datetime(2022,12,31))) for i in range(num_variables)],
    'gmv' : rng.random(num_variables) * 100,
    'revenue' : rng.random(num_variables) * 100})
df = df.astype({'revenue':'Float64'})

clients = df[['id', 'date','revenue']].groupby(['id', df.date.dt.to_period("M")], dropna=False).aggregate({'revenue': 'sum'})
clients.reset_index(inplace=True)

CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)

The output

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[31], line 1
----> 1 CrossTab = pd.crosstab(clients['id'], clients['date'], values=clients['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)

File c:UsersF3164582AppDataLocalProgramsPythonPython311Libsite-packagespandascorereshapepivot.py:691, in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
    688     df["__dummy__"] = values
    689     kwargs = {"aggfunc": aggfunc}
--> 691 table = df.pivot_table(
    692     "__dummy__",
    693     index=unique_rownames,
    694     columns=unique_colnames,
    695     margins=margins,
    696     margins_name=margins_name,
    697     dropna=dropna,
    698     **kwargs,
    699 )
    701 # Post-process
    702 if normalize is not False:

File c:UsersF3164582AppDataLocalProgramsPythonPython311Libsite-packagespandascoreframe.py:8728, in DataFrame.pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed, sort)
   8711 @Substitution("")
   8712 @Appender(_shared_docs["pivot_table"])
   8713 def pivot_table(
   (...)
...
--> 292     raise TypeError(dtype)  # pragma: no cover
    294 converted = maybe_downcast_numeric(result, dtype, do_round)
    295 if converted is not result:

TypeError: Float64
Asked By: FábioRB

||

Answers:

I´ve reported this issue on pandas github and still being analyzed.

https://github.com/pandas-dev/pandas/issues/50313

Answered By: FábioRB
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.