Solving incompatible dtype warning for pandas DataFrame when setting new column iteratively
Question:
Setting the value of a new dataframe column:
df.loc[df["Measure] == metric.label, "source_data_url"] = metric.source_data_url
now (as of Pandas version 2.1.0) gives a warning,
FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value ' metric_3' has dtype incompatible with float64, please explicitly cast to a compatible dtype
first.
The Pandas documentation discusses how the problem can be solved for a Series but it is not clear how to do this iteratively (the line above is called in a loop over metrics and it’s the final metric that gives the warning) when assigning a new DataFrame column. How can this be done?
Answers:
I had the same problem. My intuition of this is that when you are setting value for the first time to the column source_data_url
, the column does not yet exists, so pandas creates a column source_data_url
and assigns value NaN
to all of its elements. This makes Pandas think that the column’s dtype
is float64
. Then it raises this warning.
My solution was to create the column with some default value, e.g. empty string, before adding values to it:
df["source_data_url"] = ""
or None
seems also to work:
df["source_data_url"] = None
Only with "" works for me.
Since Pandas 2.1.0 setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning.
In a future version, these will raise an error and you should cast to a common dtype first.
Previous behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: ser[0] = 'not an int64'
In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object
New behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: ser[0] = 'not an int64'
FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas.
Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object
To retain the current behaviour, you could cast ser to object dtype first:
In [21]: ser = pd.Series([1, 2, 3])
In [22]: ser = ser.astype('object')
In [23]: ser[0] = 'not an int64'
In [24]: ser
Out[24]:
0 not an int64
1 2
2 3
dtype: object
Setting the value of a new dataframe column:
df.loc[df["Measure] == metric.label, "source_data_url"] = metric.source_data_url
now (as of Pandas version 2.1.0) gives a warning,
FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value ' metric_3' has dtype incompatible with float64, please explicitly cast to a compatible dtype
first.
The Pandas documentation discusses how the problem can be solved for a Series but it is not clear how to do this iteratively (the line above is called in a loop over metrics and it’s the final metric that gives the warning) when assigning a new DataFrame column. How can this be done?
I had the same problem. My intuition of this is that when you are setting value for the first time to the column source_data_url
, the column does not yet exists, so pandas creates a column source_data_url
and assigns value NaN
to all of its elements. This makes Pandas think that the column’s dtype
is float64
. Then it raises this warning.
My solution was to create the column with some default value, e.g. empty string, before adding values to it:
df["source_data_url"] = ""
or None
seems also to work:
df["source_data_url"] = None
Only with "" works for me.
Since Pandas 2.1.0 setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning.
In a future version, these will raise an error and you should cast to a common dtype first.
Previous behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: ser[0] = 'not an int64'
In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object
New behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: ser[0] = 'not an int64'
FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas.
Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object
To retain the current behaviour, you could cast ser to object dtype first:
In [21]: ser = pd.Series([1, 2, 3])
In [22]: ser = ser.astype('object')
In [23]: ser[0] = 'not an int64'
In [24]: ser
Out[24]:
0 not an int64
1 2
2 3
dtype: object