Pandas ValueError: cannot convert float NaN to integer error when doing a 'Run All' but works fine when running the specific cell

Question:

When doing an execute all in Jupiter notebook with the following code I receive a ValueError: cannot convert float NaN to integer error. But when I run this specific cell a second time it works fine. Is there anything specific that could be causing the error while doing a Run All but will work when just running the specific cell.

# New birthdate calculations
    def calculate_age(born):
        born = datetime.strptime(born, "%m/%d/%Y").date()
        today = date.today()
        return today.year - born.year - ((today.month, today.day) < (born.month, born.day))
    
    def generate_birthdate(age):   
        today = date.today() 
        if int(age) < 18:
            new_birthdate = str(random.randrange(1,12))+'/'+str(random.randrange(1,28))+'/'+str(random.randrange(today.year-18,today.year))
        else:
            new_birthdate = str(random.randrange(1,12))+'/'+str(random.randrange(1,28))+'/'+str(random.randrange(today.year-90,today.year-18))
        return new_birthdate
    
    filt = (df_good_ssn['BIRTHDATE'] == '--/--/----')
    df_good_ssn.loc[filt,'BIRTHDATE'] = '01/01/2000' # '--/--/----' is invalid. Asign any valid date for type casting. Will be overwriten by generate_birthdate
    
    df_good_ssn.loc[~filt,'AGE'] = df_good_ssn['BIRTHDATE'].apply(calculate_age)
    df_good_ssn.loc[~filt,'NEW_BIRTHDATE'] = df_good_ssn['AGE'].apply(generate_birthdate)   

ValueError                                Traceback (most recent call last)
<ipython-input-5-9e2113163ab5> in <module>
     18 
     19 df_good_ssn.loc[~filt,'AGE'] = df_good_ssn['BIRTHDATE'].apply(calculate_age)
---> 20 df_good_ssn.loc[~filt,'NEW_BIRTHDATE'] = df_good_ssn['AGE'].apply(generate_birthdate)

~AppDataRoamingPythonPython37site-packagespandascoreseries.py in apply(self, func, convert_dtype, args, **kwds)
   3846             else:
   3847                 values = self.astype(object).values
-> 3848                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3849 
   3850         if len(mapped) and isinstance(mapped[0], Series):

pandas_libslib.pyx in pandas._libs.lib.map_infer()

<ipython-input-5-9e2113163ab5> in generate_birthdate(age)
      7 def generate_birthdate(age):
      8     today = date.today()
----> 9     if int(age) < 18:
     10         new_birthdate = str(random.randrange(1,12))+'/'+str(random.randrange(1,28))+'/'+str(random.randrange(today.year-18,today.year))
     11     else:

ValueError: cannot convert float NaN to integer
Asked By: JoBaxter

||

Answers:

Would tried something like

df.fillna(0, inplace=True)

or similar applied to the column in question. That would have ensured the column data has type int to work from that point.

As for your comment, if it is the case with filt predicate, I would dig into something like pattern matching with regular expressions.

The latter may be cumbersome, however, this shall serve the purpose well.

Answered By: alphamu