I am getting this error: TypeError: '<' not supported between instances of 'str' and 'float'

Question:

I have this table that in which I am comparing list of articles (Article_body) with 4 baseline articles using cosine similarity:

Article_body articleScores1 articleScores2 articleScores3 articleScores4 articleScores5
a***** 0.6 0.2 0.7 0.9 0.2
a***** 0.3 0.8 0.1 0.5 0.1

I want to add a column that gives which column has the largest cosine similarity out of 5, condition it should be at least 0.5. If none of CosineSim(i)

Article_body articleScores1 articleScores2 articleScores3 articleScores4 Most_similar_to
a***** 0.6 0.2 0.7 0.9 CosineSim4
a***** 0.3 0.8 0.1 0.5 CosineSim2
a****** 0.1 0.2 0.3 0.4 False

I am using this code to achieve this:

cos_cols = [f"articleScores{i}" for i in range(1, 6)]    
def n_lar(text):
    if (df[cos_cols].idxmax(axis=1)) <0.5:
        return False
    
    else:
        df['Max'] = (df[cos_cols].idxmax(axis=1))

  
df['Most_similar_to'] = df.apply(n_lar)

However, I am getting this error:

TypeError: '<' not supported between instances of 'str' and 'float'

How can I resolve this?

edit:

I have this table that in which I am comparing list of articles (Article_body) with 4 baseline articles using cosine similarity:

I want to add a column that gives which column has the largest cosine similarity out of 5, condition it should be at least 0.5. If none of CosineSim(i) is atleast 0.5 then return False as in the table 2

Asked By: Python-data

||

Answers:

(df.iloc[:, 1:-1]
 .astype('float')
 .apply(
     lambda x: ('CosineSim' + x.idxmax()[-1]) if x.max() >= 0.5 else False 
     , axis=1)
)

output:

0    CosineSim4
1    CosineSim2
2         False

make result to Most_similar_to column

Answered By: Panda Kim
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.