Pandas, timedeltas, and dividing by zero

Question

Update: Apologies for the confusion this question caused: The issue was that the timedelta fields I was looking at had been cast into pandas as (non-timedelta) objects. Recasting them to time deltas resolved the issue

I have the following pandas dataframe:

index	proactive_time	passive_time
foo	timedelta(seconds=30)	timedelta(seconds=20)
bar	timedelta(0)	timedelta(0)

And I’m looking to calculate the percentage of proactive time over proactive+passive time via:

dataframe['proactive_ratio'] = dataframe['proactive_time'] / (dataframe['proactive_time'] + dataframe['passive_time'])

The row foo works just fine, however bar returns a DivideByZero error. Fair enough – but I’d like those values to return ~~timedelta(0)~~ NaN instead of throwing the error.

This is where my lack of pandas knowledge presents problems: I can solve this via python conditionals in a for loop, but for performance reasons, I’d really prefer to keep this in pandas.

The strategies I’ve tried (eg .apply) seem to only have knowledge of one column at a time, which makes combining proactive_time & passive_time unfeasible.

Any thoughts on how to resolve?

Asked By: PlankTon

||

Source

Answer 1

Pandas should already be doing what you expect. Make sure to use a recent stable version:

pd.__version__
# 1.5.3

df = pd.DataFrame({'proactive_time': [pd.Timedelta('30s'), pd.Timedelta('0s')],
                   'passive_time': [pd.Timedelta('20s'), pd.Timedelta('0s')]},
                   index=['foo', 'bar'])

df['proactive_ratio'] = df['proactive_time'] / (df['proactive_time'] + df['passive_time'])

Output:


      proactive_time     passive_time  proactive_ratio
foo  0 days 00:00:30  0 days 00:00:20              0.6
bar  0 days 00:00:00  0 days 00:00:00              NaN

Answered By: mozway

Answer 2

In earlier versions of pandas, when you divide by zero, pandas returns NaN (Not a Number), which is a common way of representing missing or undefined values. However, in newer versions of pandas, division by zero now raises an exception, which is a deliberate change made for consistency with other programming languages and to improve the correctness of calculations.

If you want to keep the old behavior of returning NaN when dividing by zero, you dont need to reinstall the older version of pandas instead you can use the numpy library . and use its seterr() method. this will make it as floating point number and handled the errors via setting different parameters in seterr() method like…

df = pd.DataFrame({'proactive_time': [pd.Timedelta('30s'), pd.Timedelta('0s')],
                   'passive_time': [pd.Timedelta('20s'), pd.Timedelta('0s')]},
                   index=['foo', 'bar'])

# Override division operator to return NaN instead of raising an exception
np.seterr(divide='ignore', invalid='ignore')

# Divide column A by column B using numpy
df['proactive_ratio'] = np.divide(df['proactive_time'],(df['proactive_time'] + df['passive_time']))

# Print the dataframe
df

Output:

     proactive_time     passive_time       proactive_ratio
foo  0 days 00:00:30     0 days 00:00:20    0.6
bar  0 days 00:00:00     0 days 00:00:00    NaN

Answered By: Nitiz

Pandas, timedeltas, and dividing by zero

Question:

Answers: