return indexes of all samples in python

Question:

I am beginner in python and have this data frame data that contains samples, values, and cluster numbers for each sample

df = pd.DataFrame({'samples': ['A', 'B', 'C', 'D', 'E'],
                   'values': [ 0.336663,0.447101,0.402529,0.373014,0.456226],
                   'cluster': [1, 0, 2, 0, 1]})
df

output:

    samples values  cluster
0   A   0.336663    1
1   B   0.447101    0
2   C   0.402529    2
3   D   0.373014    0
4   E   0.456226    1

in the following code, it return the max value sample of each cluster. for example for cluster 0, B has the max value among other samples (her B and D). So, it returns the index value for B which is 1, same for cluster 1, we have A and E, and E has max value, so the E index has return, here 4 and etc.

value = [] #list to store the max values
max_value = [] #list to store the max values
clust_max = [] #list to store cluster max
#loop to get the cluster value

tmp=df['values']
clust_labels=df['cluster']
clusters=len(list(set(clust_labels)))

for j in range(clusters):
    elems = [i for i, x in enumerate(clust_labels) if x == j] 
    values = [tmp[elem] for elem in elems] 
    max_value_temp = max(values) 
    max_value.append(max_value_temp) 
    max_ind = values.index(max_value_temp) 
    clust_max.append(elems[max_ind]) 

output:

[1, 4, 2]

Want to update this code to return all sample indexes, not only the max values of each cluster.

The expected output:

[0, 1, 2, 3, 4]
Asked By: aam

||

Answers:

I dont really get why you are using a java logic to work with pyhton, probably as mentioned you still new to it. I didnt quiet get what do you expect from the output so I did something according to what I understood.

dfc = pd.DataFrame({'samples': ['A', 'B', 'C', 'D', 'E'],
                   'values': [ 0.336663,0.447101,0.402529,0.373014,0.456226],
                   'cluster': [1, 0, 2, 0, 1]})

#get max values by cluster usign groupby
dfmax = dfc.groupby(['cluster']).max()

#insert index as a column using groupby and idxmax function
dfmax['idx'] = dfc.groupby(['cluster']).idxmax()

#you can sort values by two columns in this case values and cluster, or viceversa if you prefer which is a kinda groupby
#you are using java logic and you dont need it in pyhton, there is a pythonic way to code within python
dfsorted = dfc.sort_values(['values','cluster'], ascending=False)
Answered By: ReinholdN
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.