Find the number of mutual friends in Python
Question:
I have a dataframe of users and their friends that looks like:
user_id | friend_id
1 3
1 4
2 3
2 5
3 4
I want to write a function in python
to compute the number of mutual friends for each pair:
user_id | friend_id | num_mutual
1 3 1
1 4 1
2 3 0
2 5 0
3 4 1
Currently I have:
def find_mutual(df):
num_mutual = []
for i in range(len(df)):
user, friend = df.loc[i, 'user_id'], df.loc[i, 'friend_id']
user_list = df[df.user_id == user].friend_id.tolist() + df[df.friend_id == user].user_id.tolist()
friend_list = df[df.user_id == friend].friend_id.tolist() + df[df.friend_id == friend].user_id.tolist()
mutual = len(list(set(user_list) & set(friend_list)))
num_mutual.append(mutual)
return num_mutual
It works fine for small datasets, but I’m running it on a dataset with millions of rows. It takes forever to run everything. I know it’s not the ideal way to find the count. Is there a better algorithm in Python? Thanks in advance!
Answers:
The [ugly] idea is to construct a 4 point path that starts with a user_id
and ends with the same user_id
. If such a path exists, then 2 starting points have mutual friends.
We start with:
df
user_id friend_id
0 1 3
1 1 4
2 2 3
3 2 5
4 3 4
Then you can do:
dff = df.append(df.rename(columns={"user_id":"friend_id","friend_id":"user_id"}))
df_new = dff.merge(dff, on="friend_id", how="outer")
df_new = df_new[df_new["user_id_x"]!= df_new["user_id_y"]]
df_new = df_new.merge(dff, left_on= "user_id_y", right_on="user_id")
df_new = df_new[df_new["user_id_x"]==df_new["friend_id_y"]]
df_out = df.merge(df_new, left_on=["user_id","friend_id"], right_on=["user_id_x","friend_id_x"], how="left",suffixes=("__","_"))
df_out["count"] = (~df_out["user_id_x"].isnull()).astype(int)
df_out[["user_id__","friend_id","count"]]
user_id__ friend_id count
0 1 3 1
1 1 4 1
2 2 3 0
3 2 5 0
4 3 4 1
A more elegant and straightforward way to use a graph approach
import networkx as nx
g = nx.from_pandas_edgelist(df, "user_id","friend_id")
nx.draw_networkx(g)
Then you can identify number of mutual friends as number of paths for 2 adjacent nodes (2 friends) for which a 3 node path exists:
from networkx.algorithms.simple_paths import all_simple_paths
for row in df.itertuples():
df.at[row[0],"count"] = sum([len(l)==3 for l in list(all_simple_paths(g, row[1], row[2]))])
print(df)
user_id friend_id count
0 1 3 1.0
1 1 4 1.0
2 2 3 0.0
3 2 5 0.0
4 3 4 1.0
First create an adjacency list in the form of dictionary to contain the data
db = dict() # adjacency list
num = int(input("Enter number of friends = "))
for i in range(num):
friend = input("Enter name = ")
db[friend] = input("Enter his/her friends name separated by space = ").split()
To find the number of mutual friends between two people, compare their lists of friends and counts the number of friends that they have in common. Here is an example of how you could do this:
def num_mutual_friends(friend1, friend2):
set1 = set(friend1)
set2 = set(friend2)
mutual_friends = set1 & set2 # intersection(common friends)
return len(mutual_friends)
Test the function as follows:
friend1, friend2 = input("Enter two names separated by space = ").split()
if friend1 in db and friend2 in db:
print("Number of mutual friends = ",
num_mutual_friends(db[friend1], db[friend2]))
else:
print("Person not found")
I have a dataframe of users and their friends that looks like:
user_id | friend_id
1 3
1 4
2 3
2 5
3 4
I want to write a function in python
to compute the number of mutual friends for each pair:
user_id | friend_id | num_mutual
1 3 1
1 4 1
2 3 0
2 5 0
3 4 1
Currently I have:
def find_mutual(df):
num_mutual = []
for i in range(len(df)):
user, friend = df.loc[i, 'user_id'], df.loc[i, 'friend_id']
user_list = df[df.user_id == user].friend_id.tolist() + df[df.friend_id == user].user_id.tolist()
friend_list = df[df.user_id == friend].friend_id.tolist() + df[df.friend_id == friend].user_id.tolist()
mutual = len(list(set(user_list) & set(friend_list)))
num_mutual.append(mutual)
return num_mutual
It works fine for small datasets, but I’m running it on a dataset with millions of rows. It takes forever to run everything. I know it’s not the ideal way to find the count. Is there a better algorithm in Python? Thanks in advance!
The [ugly] idea is to construct a 4 point path that starts with a user_id
and ends with the same user_id
. If such a path exists, then 2 starting points have mutual friends.
We start with:
df
user_id friend_id
0 1 3
1 1 4
2 2 3
3 2 5
4 3 4
Then you can do:
dff = df.append(df.rename(columns={"user_id":"friend_id","friend_id":"user_id"}))
df_new = dff.merge(dff, on="friend_id", how="outer")
df_new = df_new[df_new["user_id_x"]!= df_new["user_id_y"]]
df_new = df_new.merge(dff, left_on= "user_id_y", right_on="user_id")
df_new = df_new[df_new["user_id_x"]==df_new["friend_id_y"]]
df_out = df.merge(df_new, left_on=["user_id","friend_id"], right_on=["user_id_x","friend_id_x"], how="left",suffixes=("__","_"))
df_out["count"] = (~df_out["user_id_x"].isnull()).astype(int)
df_out[["user_id__","friend_id","count"]]
user_id__ friend_id count
0 1 3 1
1 1 4 1
2 2 3 0
3 2 5 0
4 3 4 1
A more elegant and straightforward way to use a graph approach
import networkx as nx
g = nx.from_pandas_edgelist(df, "user_id","friend_id")
nx.draw_networkx(g)
Then you can identify number of mutual friends as number of paths for 2 adjacent nodes (2 friends) for which a 3 node path exists:
from networkx.algorithms.simple_paths import all_simple_paths
for row in df.itertuples():
df.at[row[0],"count"] = sum([len(l)==3 for l in list(all_simple_paths(g, row[1], row[2]))])
print(df)
user_id friend_id count
0 1 3 1.0
1 1 4 1.0
2 2 3 0.0
3 2 5 0.0
4 3 4 1.0
First create an adjacency list in the form of dictionary to contain the data
db = dict() # adjacency list
num = int(input("Enter number of friends = "))
for i in range(num):
friend = input("Enter name = ")
db[friend] = input("Enter his/her friends name separated by space = ").split()
To find the number of mutual friends between two people, compare their lists of friends and counts the number of friends that they have in common. Here is an example of how you could do this:
def num_mutual_friends(friend1, friend2):
set1 = set(friend1)
set2 = set(friend2)
mutual_friends = set1 & set2 # intersection(common friends)
return len(mutual_friends)
Test the function as follows:
friend1, friend2 = input("Enter two names separated by space = ").split()
if friend1 in db and friend2 in db:
print("Number of mutual friends = ",
num_mutual_friends(db[friend1], db[friend2]))
else:
print("Person not found")