Pandas groupby & transpose dataframe while keeping original columns

Question

I have a dataframe:

df = 

      ID       WorkAddress   City   Lat   Long   Department
1     0001     123_lane      City1  17.4  78.3   Audit        
2     0002     123_lane      City1  17.4  78.3   Lending        
3     0003     111_lane      City2  19.6  64.2   Finance       
4     0004     112_lane      City3  18.4  89.9   Legal       
5     0005     112_lane      City3  18.4  89.9   Legal

I transformed it to get a count of each ID by distinct WorkAddress, for each Department:

dfDeptCounts = df.assign(flag=df.groupby('WorkAddress').Department.cumcount())
.pivot_table(index='WorkAddress', columns=['Department'], values='ID', aggfunc='count').reset_index()

dfDeptCounts =

      WorkAddress   Audit   Lending   Finance   Legal
1     123_lane      1       1         0         0
2     111_lane      0       0         1         0     
3     112_lane      0       0         0         2

Any attempt I make to include City, Lat, Long results in an error whether, by adding it as an additional groupby, or trying to reset the index. Is there a multi-indexing level that I’m missing, or would there be a better way to transform the df to include all columns?

Edit

I apologize, I might not have been clear in my question. This is the end goal:

dfDeptCounts =

      WorkAddress   City   Lat   Long  Audit   Lending   Finance  Legal
1     123_lane      City1  17.4  78.3  1       1         0        0
2     111_lane      City2  19.6  64.2  0       0         1        0   
3     112_lane      City3  18.4  89.9  0       0         0        2

Asked By: Geordi Alm

||

Source

Answer 1

To go a bit beyond @Psidom’s answer as a comment. You can use pandas.crosstab in combination with categorical data:

df['Department'] = pd.Categorical(df['Department'],
                                  categories=['Audit', 'Lending', 'Finance',
                                              'HR', 'Legal']
                                  )
df2 = pd.crosstab(df.WorkAddress, df.Department, dropna=False)

The use of categorical data will ensure that even missing or empty categories (here "HR") will be represented in the final crosstab. For this you need to add the dropna=False parameter.

output:

>>> df2
Department   Audit  Lending  Finance  HR  Legal
WorkAddress                                    
111_lane         0        0        1   0      0
112_lane         0        0        0   0      2
123_lane         1        1        0   0      0

Now if you want to add the other information, you first need to chose which rows to drop (here it does not matter as the information is the same, so we keep the first one), and we merge it with the previous output:

(df.drop_duplicates(subset=['WorkAddress'])
   .drop('ID', axis=1)
   .merge(df2,
          left_on='WorkAddress',
          right_index=True)
)

output:

  WorkAddress   City   Lat  Long Department  Audit  Lending  Finance  HR  Legal
1    123_lane  City1  17.4  78.3      Audit      1        1        0   0      0
3    111_lane  City2  19.6  64.2    Finance      0        0        1   0      0
4    112_lane  City2  18.4  89.9      Legal      0        0        0   0      2

Answered By: mozway

Answer 2

use pivot_table and aggfunc

df1.assign(col1=1).pivot_table(index=['WorkAddress','City','Lat','Long'],columns='Department',values='col1',aggfunc=sum,fill_value=0).reset_index().rename_axis(None,axis=1)

out：

 WorkAddress   City   Lat  Long  Audit  Finance  Legal  Lending
0    111_lane  City2  19.6  64.2      0        1      0        0
1    112_lane  City3  18.4  89.9      0        0      2        0
2    123_lane  City1  17.4  78.3      1        0      0        1

Answered By: G.G

Pandas groupby & transpose dataframe while keeping original columns

Question:

Edit

Answers: