Solving incompatible dtype warning for pandas DataFrame when setting new column iteratively

Question:

Setting the value of a new dataframe column:

df.loc[df["Measure] == metric.label, "source_data_url"] = metric.source_data_url

now (as of Pandas version 2.1.0) gives a warning,

FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '       metric_3' has dtype incompatible with float64, please explicitly cast to a compatible dtype
 first.

The Pandas documentation discusses how the problem can be solved for a Series but it is not clear how to do this iteratively (the line above is called in a loop over metrics and it’s the final metric that gives the warning) when assigning a new DataFrame column. How can this be done?

Asked By: Tom

||

Answers:

I had the same problem. My intuition of this is that when you are setting value for the first time to the column source_data_url, the column does not yet exists, so pandas creates a column source_data_url and assigns value NaN to all of its elements. This makes Pandas think that the column’s dtype is float64. Then it raises this warning.

My solution was to create the column with some default value, e.g. empty string, before adding values to it:

df["source_data_url"] = ""

or None seems also to work:

df["source_data_url"] = None

Answered By: lutrarutra

Only with "" works for me.

Answered By: Rulo Sal

Since Pandas 2.1.0 setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning.

In a future version, these will raise an error and you should cast to a common dtype first.

Previous behavior:

In [1]: ser = pd.Series([1, 2, 3])

In [2]: ser
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: ser[0] = 'not an int64'

In [4]: ser
Out[4]:
0    not an int64
1               2
2               3
dtype: object

New behavior:

In [1]: ser = pd.Series([1, 2, 3])

In [2]: ser
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: ser[0] = 'not an int64'
FutureWarning:
  Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas.
  Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.

In [4]: ser
Out[4]:
0    not an int64
1               2
2               3
dtype: object

To retain the current behaviour, you could cast ser to object dtype first:

In [21]: ser = pd.Series([1, 2, 3])

In [22]: ser = ser.astype('object')

In [23]: ser[0] = 'not an int64'

In [24]: ser
Out[24]: 
0    not an int64
1               2
2               3
dtype: object

Source: https://pandas.pydata.org/docs/dev/whatsnew/v2.1.0.html#deprecated-silent-upcasting-in-setitem-like-series-operations

Answered By: den-kar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.