Pandas, timedeltas, and dividing by zero
Question:
Update: Apologies for the confusion this question caused: The issue was that the timedelta
fields I was looking at had been cast into pandas as (non-timedelta) objects. Recasting them to time deltas resolved the issue
I have the following pandas dataframe:
index
proactive_time
passive_time
foo
timedelta(seconds=30)
timedelta(seconds=20)
bar
timedelta(0)
timedelta(0)
And I’m looking to calculate the percentage of proactive time over proactive+passive time via:
dataframe['proactive_ratio'] = dataframe['proactive_time'] / (dataframe['proactive_time'] + dataframe['passive_time'])
The row foo
works just fine, however bar
returns a DivideByZero
error. Fair enough – but I’d like those values to return timedelta(0) NaN
instead of throwing the error.
This is where my lack of pandas knowledge presents problems: I can solve this via python conditionals in a for
loop, but for performance reasons, I’d really prefer to keep this in pandas.
The strategies I’ve tried (eg .apply) seem to only have knowledge of one column at a time, which makes combining proactive_time & passive_time unfeasible.
Any thoughts on how to resolve?
Answers:
Pandas should already be doing what you expect. Make sure to use a recent stable version:
pd.__version__
# 1.5.3
df = pd.DataFrame({'proactive_time': [pd.Timedelta('30s'), pd.Timedelta('0s')],
'passive_time': [pd.Timedelta('20s'), pd.Timedelta('0s')]},
index=['foo', 'bar'])
df['proactive_ratio'] = df['proactive_time'] / (df['proactive_time'] + df['passive_time'])
Output:
proactive_time passive_time proactive_ratio
foo 0 days 00:00:30 0 days 00:00:20 0.6
bar 0 days 00:00:00 0 days 00:00:00 NaN
In earlier versions of pandas, when you divide by zero, pandas returns NaN (Not a Number), which is a common way of representing missing or undefined values. However, in newer versions of pandas, division by zero now raises an exception, which is a deliberate change made for consistency with other programming languages and to improve the correctness of calculations.
If you want to keep the old behavior of returning NaN when dividing by zero, you dont need to reinstall the older version of pandas instead you can use the numpy library . and use its seterr() method. this will make it as floating point number and handled the errors via setting different parameters in seterr() method like…
df = pd.DataFrame({'proactive_time': [pd.Timedelta('30s'), pd.Timedelta('0s')],
'passive_time': [pd.Timedelta('20s'), pd.Timedelta('0s')]},
index=['foo', 'bar'])
# Override division operator to return NaN instead of raising an exception
np.seterr(divide='ignore', invalid='ignore')
# Divide column A by column B using numpy
df['proactive_ratio'] = np.divide(df['proactive_time'],(df['proactive_time'] + df['passive_time']))
# Print the dataframe
df
Output:
proactive_time passive_time proactive_ratio
foo 0 days 00:00:30 0 days 00:00:20 0.6
bar 0 days 00:00:00 0 days 00:00:00 NaN
Update: Apologies for the confusion this question caused: The issue was that the timedelta
fields I was looking at had been cast into pandas as (non-timedelta) objects. Recasting them to time deltas resolved the issue
I have the following pandas dataframe:
index | proactive_time | passive_time |
---|---|---|
foo | timedelta(seconds=30) | timedelta(seconds=20) |
bar | timedelta(0) | timedelta(0) |
And I’m looking to calculate the percentage of proactive time over proactive+passive time via:
dataframe['proactive_ratio'] = dataframe['proactive_time'] / (dataframe['proactive_time'] + dataframe['passive_time'])
The row foo
works just fine, however bar
returns a DivideByZero
error. Fair enough – but I’d like those values to return timedelta(0) NaN
instead of throwing the error.
This is where my lack of pandas knowledge presents problems: I can solve this via python conditionals in a for
loop, but for performance reasons, I’d really prefer to keep this in pandas.
The strategies I’ve tried (eg .apply) seem to only have knowledge of one column at a time, which makes combining proactive_time & passive_time unfeasible.
Any thoughts on how to resolve?
Pandas should already be doing what you expect. Make sure to use a recent stable version:
pd.__version__
# 1.5.3
df = pd.DataFrame({'proactive_time': [pd.Timedelta('30s'), pd.Timedelta('0s')],
'passive_time': [pd.Timedelta('20s'), pd.Timedelta('0s')]},
index=['foo', 'bar'])
df['proactive_ratio'] = df['proactive_time'] / (df['proactive_time'] + df['passive_time'])
Output:
proactive_time passive_time proactive_ratio
foo 0 days 00:00:30 0 days 00:00:20 0.6
bar 0 days 00:00:00 0 days 00:00:00 NaN
In earlier versions of pandas, when you divide by zero, pandas returns NaN (Not a Number), which is a common way of representing missing or undefined values. However, in newer versions of pandas, division by zero now raises an exception, which is a deliberate change made for consistency with other programming languages and to improve the correctness of calculations.
If you want to keep the old behavior of returning NaN when dividing by zero, you dont need to reinstall the older version of pandas instead you can use the numpy library . and use its seterr() method. this will make it as floating point number and handled the errors via setting different parameters in seterr() method like…
df = pd.DataFrame({'proactive_time': [pd.Timedelta('30s'), pd.Timedelta('0s')],
'passive_time': [pd.Timedelta('20s'), pd.Timedelta('0s')]},
index=['foo', 'bar'])
# Override division operator to return NaN instead of raising an exception
np.seterr(divide='ignore', invalid='ignore')
# Divide column A by column B using numpy
df['proactive_ratio'] = np.divide(df['proactive_time'],(df['proactive_time'] + df['passive_time']))
# Print the dataframe
df
Output:
proactive_time passive_time proactive_ratio
foo 0 days 00:00:30 0 days 00:00:20 0.6
bar 0 days 00:00:00 0 days 00:00:00 NaN