Pandas grouping and return result in single line

Question:

I have datafram like given below.

import pandas as pd
df = pd.DataFrame([
        ['server1', 'NA', 'NA', '2011-03-31'],
        ['server1', '2011-02-22', 'NA', 'NA'],
        ['server1', 'NA', '2011-06-22', 'NA'],
        ['server2', 'NA', 'NA', '2011-12-31'],
        ['server2', 'NA', '2011-02-21', 'NA'],
        ['server3', 'NA', 'NA', '2011-08-29'],
    ], columns=['hostname', 'patch_date1', 'patch_date2', 'patch_date3'])

df

I want to group data and show result like below.

server1 | 2011-02-22 | 2011-06-22 | 20211-03-31
server2 | NA         | 2011-02-21 | 2011-12-31
server3 | NA         | NA         | 2011-08-29

Answers:

You can do this by using .replace() and .groupby() methods like :

import pandas as pd

df = pd.DataFrame([
        ['server1', 'NA', 'NA', '2011-03-31'],
        ['server1', '2011-02-22', 'NA', 'NA'],
        ['server1', 'NA', '2011-06-22', 'NA'],
        ['server2', 'NA', 'NA', '2011-12-31'],
        ['server2', 'NA', '2011-02-21', 'NA'],
        ['server3', 'NA', 'NA', '2011-08-29'],
    ], columns=['hostname', 'patch_date1', 'patch_date2', 'patch_date3'])

df = df.replace('NA', '').groupby('hostname').max().replace('', 'NA') # like this

print(df)

output:

         patch_date1 patch_date2 patch_date3
hostname                                    
server1   2011-02-22  2011-06-22  2011-03-31
server2           NA  2011-02-21  2011-12-31
server3           NA          NA  2011-08-29
Answered By: mrCopiCat

You can use pandas.DataFrame.groupby and pandas.DataFrame.first combined :

import numpy as np

df.replace('NA', np.nan, inplace=True)

out = df.groupby('hostname', as_index=False).first()

out.fillna('NA', inplace=True)

>>> print(out)

enter image description here

Answered By: L'Artiste
df
    .replace("NA", np.nan)
    .groupby("hostname")
    .first()
    .reset_index()
    .fillna("NA")
Answered By: der Fotik