Sort DataFrame by occurrence in one column, while preserving order in other columns

Question:

I would like to sort DataFrame in a similar fashion to this SO question:
Sorting entire csv by frequency of occurence in one column

However, one issue I’m encountering is that the count is not guaranteed to be unique and in that case rows will be interleaved (I’m using the method suggested by EdChum in the above question)

Given the following DataFrame:

cluster_id,distance,url
1,0.15,aaa.com
1,0.25,bbb.com
2,0.05,ccc.com
2,0.10,ccc.com
7,0.1,abc.com
7,0.2,def.com
7,0.3,xyz.com

After I would like it to be:

cluster_id,distance,url
7,0.1,abc.com
7,0.2,def.com
7,0.3,xyz.com
1,0.15,aaa.com
1,0.25,bbb.com
2,0.05,ccc.com
2,0.10,ccc.com

Note that columns cluster_id and distance are still in order, after sorting by occurrence of “cluster_id”

Asked By: clwen

||

Answers:

We can sort by cluster_id and new column’G’:

df.assign(G=df.groupby('cluster_id').cluster_id.transform('count')).sort_values(['G','cluster_id'],ascending=[False,True]).drop('G',1)
Out[248]: 
   cluster_id  distance      url
4           7      0.10  abc.com
5           7      0.20  def.com
6           7      0.30  xyz.com
0           1      0.15  aaa.com
1           1      0.25  bbb.com
2           2      0.05  ccc.com
3           2      0.10  ccc.com
Answered By: BENY

`
pno dn

0 A AA

1 B BB

2 A AA
`
to sort in ascending order

g.assign(G=g.groupby(‘dn’).dn.transform(‘count’)).sort_values([‘G’,’dn’],ascending=[True,False]).drop(‘G’,1)

pno dn

1 B BB

0 A AA

2 A AA

Answered By: Amir
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.