Calculating Cosine Similarity with Large 2d Vector Py
Question:
Trying to calculate cosine similarity of a pandas dataframe column. No problems with calculating with small dataset (e.g., 100 samples). Errors occur when dataset increases size to 190k + rows. Is there an alternative way to calculate this?
No error message comes up, but my kernel keeps dying.
from sklearn.metrics.pairwise import cosine_similarity
sentence_embeddings=np.array(df['summary_tokens'].tolist(), dtype='float32')
similarity = cosine_similarity(sentence_embeddings)
Answers:
Solution was found after calculating similarity on smaller np arrays!
Trying to calculate cosine similarity of a pandas dataframe column. No problems with calculating with small dataset (e.g., 100 samples). Errors occur when dataset increases size to 190k + rows. Is there an alternative way to calculate this?
No error message comes up, but my kernel keeps dying.
from sklearn.metrics.pairwise import cosine_similarity
sentence_embeddings=np.array(df['summary_tokens'].tolist(), dtype='float32')
similarity = cosine_similarity(sentence_embeddings)
Solution was found after calculating similarity on smaller np arrays!