Check if 2 or more items in array are the same and delete one of them according to sorting number – Python

Question:

I have a dataframe that has sorted codes according to volume and trade.

Code Volume Trade
ApplA 500 1000
Amazon 1000 500
Facebook 250 750
ApplE 100 1500
df_samecompanies['Codes'] = see below
df['Volume Order'] = df['Volume'].rank(ascending=False)
df['Trade Order']  = df['Trade'].rank(ascending=False)
df['Trade Order2']  = df['Trade'].rank(ascending=True)
df['Max Ordered Number']  = df[['Volume Order', 'Trade Order']].max(axis=1)
df['Final Sorted Number'] = df[['Max Ordered Number', 'Trade Order2']].apply(tuple, axis=1).rank(ascending=False)
df.drop('Trade Order2', axis=1)

This code outputs:

       Code  Volume  Trade  Value  Volume Order  Trade Order  Max Ordered Number  Final Sorted Number
0     ApplA     500   1000   <NA>             2            2                   2                    4
1    Amazon    1000    500   <NA>             1            4                   4                    2
2  Facebook     250    750   <NA>             3            3                   3                    3
3     ApplE     100   1500   <NA>             4            1                   4                    1

Two same code were in the final sorted range (APPLA and APPLE). APPLA has 4 as "final sorted number" and APPLE has 1 as "final sorted number". Since, they are same code (APPL), I want to omit from the list the one with the smallest "final sorted number". In this case ApplE.

df_samecompanies[‘Codes’] consist of companies and their different shares.Like ApplA,ApplE and AmazA and AmazonB.

I have another dataframe which consist of same companies.(like amazA,amazonB and apple,appla). This df include only ‘E’ , ‘D’, ‘A’, ‘B’ as a last characters.


if same companies DF in DF(main df): 
   if last character == 'D' or 'A': 

 check final sorted number and who has smallest final sort number, delete another code from list

df_samecompanies =

Codes Code Shares
ApplA Apple A
Amazon Amazon Empty
Facebook Facebook Empty
ApplE Apple E
AmazA Amazon A
AmazonB Amazon B
Asked By: Bianconera

||

Answers:

Use df_companies dataframe to group same rows and keep only the best ranking for each group.

Input data:

>>> df
       Code  Volume  Trade  Volume Order  Trade Order  Max Ordered Number  Final Sorted Number
0     ApplA     500   1000           2.0          2.0                 2.0                  4.0
1    Amazon    1000    500           1.0          4.0                 4.0                  2.0
2  Facebook     250    750           3.0          3.0                 3.0                  3.0
3     ApplE     100   1500           4.0          1.0                 4.0                  1.0

>>> df_companies
      Codes      Code Shares
0     ApplA     Apple      A
1    Amazon    Amazon  Empty
2  Facebook  Facebook  Empty
3     ApplE     Apple      E
4     AmazA    Amazon      A
5   AmazonB    Amazon      B
out = df.sort_values('Final Sorted Number') 
        .merge(df_companies[['Code', 'Codes']], how='left', 
               left_on='Code', right_on='Codes', suffixes=('', '2')) 
        .drop_duplicates('Code2') 
        .drop(columns=['Code2', 'Codes'])

Output result:

>>> out
       Code  Volume  Trade  Volume Order  Trade Order  Max Ordered Number  Final Sorted Number
0     ApplE     100   1500           4.0          1.0                 4.0                  1.0
1    Amazon    1000    500           1.0          4.0                 4.0                  2.0
2  Facebook     250    750           3.0          3.0                 3.0                  3.0

Step by step:

# Reduce number of columns for readability
>>> df = df[['Code', 'Final Sorted Number']]
       Code  Final Sorted Number
0     ApplA                  4.0
1    Amazon                  2.0
2  Facebook                  3.0
3     ApplE                  1.0


# Sort rows by 'Final Sorted Number'
>>> df = df.sort_values('Final Sorted Number')
       Code  Final Sorted Number
3     ApplE                  1.0
1    Amazon                  2.0
2  Facebook                  3.0
0     ApplA                  4.0

# Merge dataframes on a common key: 'Code' for left, 'Codes' for right
>>> df = df.merge(df_companies[['Code', 'Codes']], how='left', 
                  left_on='Code', right_on='Codes', suffixes=('', '2'))
       Code  Final Sorted Number     Code2     Codes
0     ApplE                  1.0     Apple     ApplE
1    Amazon                  2.0    Amazon    Amazon
2  Facebook                  3.0  Facebook  Facebook
3     ApplA                  4.0     Apple     ApplA

Now, we have 2 new columns from df_companies: Code (renamed into Code2 by suffix) and Codes. Let’s continue:

# Keep only the first row for each 'Code2' name
>>> df = df.drop_duplicates('Code2')
       Code  Final Sorted Number     Code2     Codes
0     ApplE                  1.0     Apple     ApplE
1    Amazon                  2.0    Amazon    Amazon
2  Facebook                  3.0  Facebook  Facebook

# Remove added columns from the other dataframe
>>> df = df.drop(columns=['Code2', 'Codes'])
       Code  Final Sorted Number
0     ApplE                  1.0
1    Amazon                  2.0
2  Facebook                  3.0
Answered By: Corralien
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.