Find the number of mutual friends in Python

Question:

I have a dataframe of users and their friends that looks like:

user_id | friend_id
1         3
1         4
2         3
2         5
3         4

I want to write a function in python to compute the number of mutual friends for each pair:

user_id | friend_id | num_mutual
1         3           1
1         4           1
2         3           0
2         5           0
3         4           1

Currently I have:

def find_mutual(df):
    num_mutual = []
    for i in range(len(df)):
        user, friend = df.loc[i, 'user_id'], df.loc[i, 'friend_id']
        user_list = df[df.user_id == user].friend_id.tolist() + df[df.friend_id == user].user_id.tolist()
        friend_list = df[df.user_id == friend].friend_id.tolist() + df[df.friend_id == friend].user_id.tolist()
        mutual = len(list(set(user_list) & set(friend_list)))
        num_mutual.append(mutual)
    return num_mutual

It works fine for small datasets, but I’m running it on a dataset with millions of rows. It takes forever to run everything. I know it’s not the ideal way to find the count. Is there a better algorithm in Python? Thanks in advance!

Asked By: CWuu

||

Answers:

The [ugly] idea is to construct a 4 point path that starts with a user_id and ends with the same user_id. If such a path exists, then 2 starting points have mutual friends.

We start with:

df
          user_id  friend_id
0        1          3
1        1          4
2        2          3
3        2          5
4        3          4

Then you can do:

dff = df.append(df.rename(columns={"user_id":"friend_id","friend_id":"user_id"}))
df_new = dff.merge(dff, on="friend_id", how="outer")
df_new = df_new[df_new["user_id_x"]!= df_new["user_id_y"]]
df_new = df_new.merge(dff, left_on= "user_id_y", right_on="user_id")
df_new = df_new[df_new["user_id_x"]==df_new["friend_id_y"]]
df_out = df.merge(df_new, left_on=["user_id","friend_id"], right_on=["user_id_x","friend_id_x"], how="left",suffixes=("__","_"))
df_out["count"] = (~df_out["user_id_x"].isnull()).astype(int)
df_out[["user_id__","friend_id","count"]]

   user_id__  friend_id  count
0          1          3      1
1          1          4      1
2          2          3      0
3          2          5      0
4          3          4      1

A more elegant and straightforward way to use a graph approach

import networkx as nx
g = nx.from_pandas_edgelist(df, "user_id","friend_id")
nx.draw_networkx(g)

enter image description here

Then you can identify number of mutual friends as number of paths for 2 adjacent nodes (2 friends) for which a 3 node path exists:

from networkx.algorithms.simple_paths import all_simple_paths
for row in df.itertuples():
    df.at[row[0],"count"] = sum([len(l)==3 for l in list(all_simple_paths(g, row[1], row[2]))])
print(df)
   user_id  friend_id  count
0        1          3    1.0
1        1          4    1.0
2        2          3    0.0
3        2          5    0.0
4        3          4    1.0
Answered By: Sergey Bushmanov

First create an adjacency list in the form of dictionary to contain the data

db = dict()  # adjacency list
num = int(input("Enter number of friends = "))
for i in range(num):
    friend = input("Enter name = ")
    db[friend] = input("Enter his/her friends name separated by space = ").split()

To find the number of mutual friends between two people, compare their lists of friends and counts the number of friends that they have in common. Here is an example of how you could do this:

def num_mutual_friends(friend1, friend2):
set1 = set(friend1)
set2 = set(friend2)
mutual_friends = set1 & set2  # intersection(common friends)
return len(mutual_friends)

Test the function as follows:

friend1, friend2 = input("Enter two names separated by space = ").split()
if friend1 in db and friend2 in db:
    print("Number of mutual friends = ", 
    num_mutual_friends(db[friend1], db[friend2]))
else:
    print("Person not found")
Answered By: Ankit Paul