How to score different grouped pandas row values as 1 and 2 in a separate column through conditions?

Question:

Example dataframe:

enter image description here

df_sample = pd.DataFrame({'query': {0: 'keyword_1',
  1: 'keyword_1',
  2: 'keyword_2',
  3: 'keyword_2',
  4: 'keyword_3',
  5: 'keyword_3',
  6: 'keyword_4',
  7: 'keyword_4'},
 'page': {0: 'google.com',
  1: 'apple.com',
  2: 'google.com',
  3: 'apple.com',
  4: 'google.com',
  5: 'apple.com',
  6: 'papaya.com',
  7: 'foobaar.com'},
 'rank': {0: 3, 1: 2, 2: 1, 3: 11, 4: 5, 5: 10, 6: 11, 7: 11}})

df_sample

Suppose the keywords in ‘query’ returns the two URLs in ‘page’ in a search engine ranked at different positions (‘rank’).

Each keyword has two URLs at different ranks or the same 11th rank (11 means they weren’t found on the first page).

I want to score the pages. My solution:

The lower the rank (lowest = 1), the higher the score. Since only two URLs, we can score them 1 and 2. 11 will receive a score of 1 as it means a rank>10.

Except in the case where both URLs of that keyword are ranked 11, in which case we drop both rows containing that keyword.

We will need a separate column = ‘score’.

Remember, a keyword never repeats more than twice, (42 rows have only 21 keywords), but different keywords can contain the same URLs.

Output:

enter image description here

df_sample_2 = pd.DataFrame({'query': {0: 'keyword_1',
  1: 'keyword_1',
  2: 'keyword_2',
  3: 'keyword_2',
  4: 'keyword_3',
  5: 'keyword_3'},
 'page': {0: 'google.com',
  1: 'apple.com',
  2: 'google.com',
  3: 'apple.com',
  4: 'google.com',
  5: 'apple.com'},
 'rank': {0: 3, 1: 2, 2: 1, 3: 11, 4: 5, 5: 10}, 'score': {0: 1, 1: 2, 2: 2, 3: 1, 4: 2, 5: 1}})

df_sample_2
Asked By: Mystic

||

Answers:

You need two lines of codes (and both of these lines can be interchanged)

grouped_df = df_sample.groupby('query')

# To remove group with same rank
new_df = grouped_df.filter(lambda x: x['rank'].nunique() >1)

       query        page  rank
0  keyword_1  google.com     3
1  keyword_1   apple.com     2
2  keyword_2  google.com     1
3  keyword_2   apple.com    11
4  keyword_3  google.com     5
5  keyword_3   apple.com    10

# To rank the dataframe in descending order by group
new_df['score'] = grouped_df['rank'].rank(ascending=False)

       query        page  rank  score
0  keyword_1  google.com     3    1.0
1  keyword_1   apple.com     2    2.0
2  keyword_2  google.com     1    2.0
3  keyword_2   apple.com    11    1.0
4  keyword_3  google.com     5    2.0
5  keyword_3   apple.com    10    1.0
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.