Check if 2 or more items in array are the same and delete one of them according to sorting number – Python
Question:
I have a dataframe that has sorted codes according to volume and trade.
Code
Volume
Trade
ApplA
500
1000
Amazon
1000
500
Facebook
250
750
ApplE
100
1500
df_samecompanies['Codes'] = see below
df['Volume Order'] = df['Volume'].rank(ascending=False)
df['Trade Order'] = df['Trade'].rank(ascending=False)
df['Trade Order2'] = df['Trade'].rank(ascending=True)
df['Max Ordered Number'] = df[['Volume Order', 'Trade Order']].max(axis=1)
df['Final Sorted Number'] = df[['Max Ordered Number', 'Trade Order2']].apply(tuple, axis=1).rank(ascending=False)
df.drop('Trade Order2', axis=1)
This code outputs:
Code Volume Trade Value Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplA 500 1000 <NA> 2 2 2 4
1 Amazon 1000 500 <NA> 1 4 4 2
2 Facebook 250 750 <NA> 3 3 3 3
3 ApplE 100 1500 <NA> 4 1 4 1
Two same code were in the final sorted range (APPLA and APPLE). APPLA has 4 as "final sorted number" and APPLE has 1 as "final sorted number". Since, they are same code (APPL), I want to omit from the list the one with the smallest "final sorted number". In this case ApplE.
df_samecompanies[‘Codes’] consist of companies and their different shares.Like ApplA,ApplE and AmazA and AmazonB.
I have another dataframe which consist of same companies.(like amazA,amazonB and apple,appla). This df include only ‘E’ , ‘D’, ‘A’, ‘B’ as a last characters.
if same companies DF in DF(main df):
if last character == 'D' or 'A':
check final sorted number and who has smallest final sort number, delete another code from list
df_samecompanies =
Codes
Code
Shares
ApplA
Apple
A
Amazon
Amazon
Empty
Facebook
Facebook
Empty
ApplE
Apple
E
AmazA
Amazon
A
AmazonB
Amazon
B
Answers:
Use df_companies
dataframe to group same rows and keep only the best ranking for each group.
Input data:
>>> df
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplA 500 1000 2.0 2.0 2.0 4.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
3 ApplE 100 1500 4.0 1.0 4.0 1.0
>>> df_companies
Codes Code Shares
0 ApplA Apple A
1 Amazon Amazon Empty
2 Facebook Facebook Empty
3 ApplE Apple E
4 AmazA Amazon A
5 AmazonB Amazon B
out = df.sort_values('Final Sorted Number')
.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2'))
.drop_duplicates('Code2')
.drop(columns=['Code2', 'Codes'])
Output result:
>>> out
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplE 100 1500 4.0 1.0 4.0 1.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
Step by step:
# Reduce number of columns for readability
>>> df = df[['Code', 'Final Sorted Number']]
Code Final Sorted Number
0 ApplA 4.0
1 Amazon 2.0
2 Facebook 3.0
3 ApplE 1.0
# Sort rows by 'Final Sorted Number'
>>> df = df.sort_values('Final Sorted Number')
Code Final Sorted Number
3 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0
0 ApplA 4.0
# Merge dataframes on a common key: 'Code' for left, 'Codes' for right
>>> df = df.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2'))
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
3 ApplA 4.0 Apple ApplA
Now, we have 2 new columns from df_companies
: Code (renamed into Code2 by suffix) and Codes. Let’s continue:
# Keep only the first row for each 'Code2' name
>>> df = df.drop_duplicates('Code2')
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
# Remove added columns from the other dataframe
>>> df = df.drop(columns=['Code2', 'Codes'])
Code Final Sorted Number
0 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0
I have a dataframe that has sorted codes according to volume and trade.
Code | Volume | Trade |
---|---|---|
ApplA | 500 | 1000 |
Amazon | 1000 | 500 |
250 | 750 | |
ApplE | 100 | 1500 |
df_samecompanies['Codes'] = see below
df['Volume Order'] = df['Volume'].rank(ascending=False)
df['Trade Order'] = df['Trade'].rank(ascending=False)
df['Trade Order2'] = df['Trade'].rank(ascending=True)
df['Max Ordered Number'] = df[['Volume Order', 'Trade Order']].max(axis=1)
df['Final Sorted Number'] = df[['Max Ordered Number', 'Trade Order2']].apply(tuple, axis=1).rank(ascending=False)
df.drop('Trade Order2', axis=1)
This code outputs:
Code Volume Trade Value Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplA 500 1000 <NA> 2 2 2 4
1 Amazon 1000 500 <NA> 1 4 4 2
2 Facebook 250 750 <NA> 3 3 3 3
3 ApplE 100 1500 <NA> 4 1 4 1
Two same code were in the final sorted range (APPLA and APPLE). APPLA has 4 as "final sorted number" and APPLE has 1 as "final sorted number". Since, they are same code (APPL), I want to omit from the list the one with the smallest "final sorted number". In this case ApplE.
df_samecompanies[‘Codes’] consist of companies and their different shares.Like ApplA,ApplE and AmazA and AmazonB.
I have another dataframe which consist of same companies.(like amazA,amazonB and apple,appla). This df include only ‘E’ , ‘D’, ‘A’, ‘B’ as a last characters.
if same companies DF in DF(main df):
if last character == 'D' or 'A':
check final sorted number and who has smallest final sort number, delete another code from list
df_samecompanies =
Codes | Code | Shares |
---|---|---|
ApplA | Apple | A |
Amazon | Amazon | Empty |
Empty | ||
ApplE | Apple | E |
AmazA | Amazon | A |
AmazonB | Amazon | B |
Use df_companies
dataframe to group same rows and keep only the best ranking for each group.
Input data:
>>> df
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplA 500 1000 2.0 2.0 2.0 4.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
3 ApplE 100 1500 4.0 1.0 4.0 1.0
>>> df_companies
Codes Code Shares
0 ApplA Apple A
1 Amazon Amazon Empty
2 Facebook Facebook Empty
3 ApplE Apple E
4 AmazA Amazon A
5 AmazonB Amazon B
out = df.sort_values('Final Sorted Number')
.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2'))
.drop_duplicates('Code2')
.drop(columns=['Code2', 'Codes'])
Output result:
>>> out
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplE 100 1500 4.0 1.0 4.0 1.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
Step by step:
# Reduce number of columns for readability
>>> df = df[['Code', 'Final Sorted Number']]
Code Final Sorted Number
0 ApplA 4.0
1 Amazon 2.0
2 Facebook 3.0
3 ApplE 1.0
# Sort rows by 'Final Sorted Number'
>>> df = df.sort_values('Final Sorted Number')
Code Final Sorted Number
3 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0
0 ApplA 4.0
# Merge dataframes on a common key: 'Code' for left, 'Codes' for right
>>> df = df.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2'))
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
3 ApplA 4.0 Apple ApplA
Now, we have 2 new columns from df_companies
: Code (renamed into Code2 by suffix) and Codes. Let’s continue:
# Keep only the first row for each 'Code2' name
>>> df = df.drop_duplicates('Code2')
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
# Remove added columns from the other dataframe
>>> df = df.drop(columns=['Code2', 'Codes'])
Code Final Sorted Number
0 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0