pandas/python : Get each distinct values of each column as columns and their counts as rows

Question:

I have a data frame like this with below code,

df=pd.DataFrame(columns=['col1', 'col2', 'col3'])
df.col1=['q1', 'q2', 'q2', 'q3', 'q4', 'q4']
df.col2=['b', 'a', 'a', 'c', 'b', 'b']
df.col3=['p', 'q', 'r', 'p', 'q', 'q']   

df

    col1    col2    col3
0    q1        b    p
1    q2        a    q
2    q2        a    r
3    q3        c    p
4    q4        b    q
5    q4        b    q

Now I want to col1 wise count of each col2 and col3 variables as rows and unique values of col2 and col3 as columns, so that the final dataframe would look like,

col1    a    b    c    p    q    r
 q1     0    1    0    1    0    0
 q2     2    0    0    0    1    1
 q3     0    0    1    1    0    0
 q4     0    2    0    0    2    0

I could do this using a for loop iterating through the dataframe and storing the results, but this is computationally expensive and not a right way to do it.

Looking for alternatives pythonic/pandas way to do it efficiently.

Asked By: Kallol

||

Answers:

melt the data by col1, then crosstab:

m = df.melt('col1')
pd.crosstab(m['col1'], m['value']).reset_index()

Output:

value col1  a  b  c  p  q  r
0       q1  0  1  0  1  0  0
1       q2  2  0  0  0  1  1
2       q3  0  0  1  1  0  0
3       q4  0  2  0  0  2  0
Answered By: Quang Hoang