prevent pandas.combine from converting dtypes

Question

Undesired behavior: pandas.combine turns ints to floats.

Description:
My DataFrame contains a list of filenames (index) and some metadata about each:

            pags  rating  tms  glk
name                              
file1  original0       1    1    1
file2  original1       2    2    2
file3  original2       3    3    3
file4  original3       4    4    4
file5  original4       5    5    5

Sometimes I need to update some of the columns for some of the files, leaving all other cells unchanged.
Furthermore, the update can contain new files that I need to add as new rows (probably with some N/As).
The update comes in the form of another DataFrame upd:

       pags  rating
name               
file4  new0      11
file5  new1      12
file6  new2      13
file7  new3      14

Here, I want to change pags and rating for files 4,5 and append new rows for files 6,7.
I found I can do this with pd.combine:

df = df.combine(upd, lambda old,new: new.fillna(old), overwrite=False)[df.columns]

            pags  rating  tms  glk
name                              
file1  original0     1.0  1.0  1.0
file2  original1     2.0  2.0  2.0
file3  original2     3.0  3.0  3.0
file4       new0    11.0  4.0  4.0
file5       new1    12.0  5.0  5.0
file6       new2    13.0  NaN  NaN
file7       new3    14.0  NaN  NaN

The only problem is that all integer columns turned to floating points.
How do I keep the original dtypes?
I strongly want to avoid manual .astype() for every column.

Code to create this example:

df = pd.DataFrame({
    'name': ['file1','file2','file3','file4','file5'],
    'pags': ["original"+str(i) for i in range(5)],
    'rating': [1, 2, 3, 4, 5],
    'tms': [1, 2, 3, 4, 5],
    'glk': [1, 2, 3, 4, 5],
}).set_index('name')

upd = pd.DataFrame({
    'name': ['file4','file5','file6','file7'],
    'pags': ["new"+str(i) for i in range(4)],
    'rating': [11, 12, 13, 14],
}).set_index('name')

df = df.combine(upd, lambda old,new: new.fillna(old), overwrite=False)[df.columns]

Asked By: Michael Pruglo

||

Source

Answer 1

df.astype() can apply all dtypes at once
so what ended up working in my case was:

self.df = ...  # read from disk
upd = ...  # get updates
original_dtypes = self.df.dtypes
self.df = self.df.combine(upd, lambda old,new: new.fillna(old), overwrite=False)[self.df.columns]
self.df = self.df.apply(...)  # fill in the missing data
self.df = self.df.astype(original_dtypes)

Answered By: Michael Pruglo

prevent pandas.combine from converting dtypes

Question:

Answers: