cosine-similarity

What is the most efficient way to identify text similarity between items in large lists of strings in Python?

What is the most efficient way to identify text similarity between items in large lists of strings in Python? Question: The following piece of code achieves the results I’m trying to achieve. There is a list of strings called ‘lemmas’ that contains the accepted forms of a specific class of words. The other list, called …

Total answers: 2

Row wise cosine similarity calculation in pandas

Row wise cosine similarity calculation in pandas Question: I have a dataframe that looks like this: api_spec_id label Paths_modified Tags_modified Endpoints_added 933 803.0 minor 8.0 3.0 6 934 803.0 patch 0.0 4.0 2 935 803.0 patch 3.0 1.0 0 938 803.0 patch 10.0 0.0 4 939 803.0 patch 3.0 5.0 1 940 803.0 patch 6.0 …

Total answers: 4

Can stop phrases be removed while doing text processing in python?

Can stop phrases be removed while doing text processing in python? Question: On the task that I’m working on, involves finding the cosine similarity using tfidf between a base transcript and other sample transcripts. I am removing stop words for this. But I would also like to remove certain stop phrases that are unique to …

Total answers: 1

SparseTermSimilarityMatrix().inner_product() throws "cannot unpack non-iterable bool object"

SparseTermSimilarityMatrix().inner_product() throws "cannot unpack non-iterable bool object" Question: While working with cosine similarity, I am facing issue calculating the inner product of two vectors. Code: from gensim.similarities import ( WordEmbeddingSimilarityIndex, SparseTermSimilarityMatrix ) w2v_model = api.load("glove-wiki-gigaword-50") similarity_index = WordEmbeddingSimilarityIndex(w2v_model) similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary) score = similarity_matrix.inner_product( X = [ (0, 1), (1, 1), (2, 1), (3, …

Total answers: 1

Cosine similarity of two columns in a DataFrame

Cosine similarity of two columns in a DataFrame Question: I’ve a dataframe with 2 columns and I am tring to get a cosine similarity score of each pair of sentences. Dataframe (df) A B 0 Lorem ipsum ta lorem ipsum 1 Excepteur sint occaecat excepteur 2 Duis aute irure aute irure some of the code …

Total answers: 1

Calculate Distance Metric between Homomorphic Encrypted Vectors

Calculate Distance Metric between Homomorphic Encrypted Vectors Question: Is there a way to calculate a distance metric (euclidean or cosine similarity or manhattan) between two homomorphically encrypted vectors? Specifically, I’m looking to generate embeddings of documents (using a transformer), homomorphically encrypting those embeddings, and wanting to calculate a distance metric between embeddings to obtain document …

Total answers: 1

How to check each column for value greater than and accept if 3 out of 5 columns has that value

How to check each column for value greater than and accept if 3 out of 5 columns has that value Question: I am trying to make a article similarity checker by comparing 6 articles with list of articles that I obtained from an API. I have used cosine similarity to compare each article one by …

Total answers: 1

How to find the cosine similarity between 2 dataframe in pandas?

Total answers: 1

Find cosine similarity between different pandas dataframe

Find cosine similarity between different pandas dataframe Question: I have three pandas dataframe, suppose group_1, group_2, group_3 import pandas as pd group_1 = pd.DataFrame({‘A’:[1,0,1,1,1], ‘B’:[1,1,1,1,1]}) group_2 = pd.DataFrame({‘A’:[1,1,1,1,1], ‘B’:[1,1,0,0,0]}) group_3 = pd.DataFrame({‘A’:[1,1,1,1,1], ‘B’:[0,0,0,0,0]}) filled dummy value, all value will be binary for above group Now, there is another dataframe , new one new_data_frame = pd.DataFrame({‘A’:[1,1,1,1,1], …

Total answers: 1

Calculating Cosine Similarity with Large 2d Vector Py

Calculating Cosine Similarity with Large 2d Vector Py Question: Trying to calculate cosine similarity of a pandas dataframe column. No problems with calculating with small dataset (e.g., 100 samples). Errors occur when dataset increases size to 190k + rows. Is there an alternative way to calculate this? No error message comes up, but my kernel …

Total answers: 1