Calculating Cosine Similarity with Large 2d Vector Py

Question:

Trying to calculate cosine similarity of a pandas dataframe column. No problems with calculating with small dataset (e.g., 100 samples). Errors occur when dataset increases size to 190k + rows. Is there an alternative way to calculate this?

No error message comes up, but my kernel keeps dying.

from sklearn.metrics.pairwise import cosine_similarity

sentence_embeddings=np.array(df['summary_tokens'].tolist(), dtype='float32')

similarity = cosine_similarity(sentence_embeddings)

Sentence Embeddings Picture

Asked By: Brian Phelps

||

Answers:

Solution was found after calculating similarity on smaller np arrays!

Answered By: Brian Phelps
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.