Reorder dataframe groupby medians following custom order

Question:

I have a dataset containing a bunch of data in the columns params and value. I’d like to count how many values each params contains (to use as labels in a boxplot), so I use mydf['params'].value_counts() to show this:

slidingwindow_250     11574
hotspots_1k_100        8454
slidingwindow_500      5793
slidingwindow_100      5366
hotspots_5k_500        3118
slidingwindow_1000     2898
hotspots_10k_1k        1772
slidingwindow_2500     1160
slidingwindow_5000      580
Name: params, dtype: int64

I have a list of all of the entries in params in the order I wish to display them in a boxplot. I try to use sort_index(level=myorder) to get them in my custom order, but the function ignores myorder and just sorts them alphabetically.

myorder = ["slidingwindow_100",
          "slidingwindow_250",
          "slidingwindow_500",
          "slidingwindow_1000",
          "slidingwindow_2500",
          "slidingwindow_5000",
          "hotspots_1k_100",
          "hotspots_5k_500",
          "hotspots_10k_1k"]

sizes_bp_log_df['params'].value_counts().sort_index(level=myorder)

hotspots_10k_1k        1772
hotspots_1k_100        8454
hotspots_5k_500        3118
slidingwindow_100      5366
slidingwindow_1000     2898
slidingwindow_250     11574
slidingwindow_2500     1160
slidingwindow_500      5793
slidingwindow_5000      580
Name: params, dtype: int64

How can I get the index of my value counts in the order I want them to be in?

In addition, I’ll be using the median of each distribution as coordinates for the boxplot labels too, which I retrieve using sizes_bp_log_df.groupby(['params']).median(); hopefully your suggested sort methods will also work for that task.

Asked By: Whitehot

||

Answers:

Use reindex instead of sort_index

Answered By: Ashyam
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.