Filter and merge a dataframe in Python using Pandas

Question:

I have a dataframe and I need to filter out who is the owner of which books so we can send them notifications. I am having trouble merging the data in the format I need.

Existing dataframe

Book Owner
The Alchemist marry
To Kill a Mockingbird john
Lord of the Flies abel
Catcher in the Ry marry
Alabama julia;marry
Invisible Man john

I need to create new dataframe that lists the owners in column A and all the books they own in Column B.
Desired output

Owners Books
marry The Alchemist, Catcher in the Ry, Alabama
john To Kill a Mockingbird, Invisible Man
abel Lord of the Flies
julia Alabama

I tried creating 2 dfs from and then merging but the results are never accurate. Anyone know a more efficient way to do this?

Current code not working:

from pathlib import Path
import pandas as pd 

file1 = Path.cwd() / "./bookgrid.xlsx"


df1 = pd.read_excel(file1)
df2 = pd.read_excel(file1)

##Perfrom the Vlookup Merge
merge = pd.merge(df1, df2, how="left")

merge.to_excel("./results.xlsx")
Asked By: Dinerz

||

Answers:

You need to split, explode, groupby.agg:

(df.assign(Owner=lambda d: d['Owner'].str.split(';'))
   .explode('Owner')
   .groupby('Owner', as_index=False, sort=False).agg(', '.join)
)

NB. if you need the plural form in the column headers, add .add_suffix('s') or .rename(columns={'Book': 'Books', 'Owner': 'Owners'}).

Output:

   Owner                                       Book
0  marry  The Alchemist, Catcher in the Ry, Alabama
1   john       To Kill a Mockingbird, Invisible Man
2   abel                          Lord of the Flies
3  julia                                    Alabama
Answered By: mozway

Lets try something new

s = df['Owner'].str.get_dummies(';')
(s.T @ df['Book'].add(', ')).str.rstrip(', ')

Result

abel                             Lord of the Flies
john          To Kill a Mockingbird, Invisible Man
julia                                      Alabama
marry    The Alchemist, Catcher in the Ry, Alabama
dtype: object
Answered By: Shubham Sharma

Not the fastest way, but here’s an easy to follow way.

import pandas as pd 

# Set up the example dataframe
data = {'Book':['The Alchemist','To Kill a Mockingbird','Lord of the Flies','Catcher in the Ry','Alabama','Invisible Man'],'Owner':['marry','john','abel','marry','julia;marry','john']}

df = pd.DataFrame(data)

# Turn your string of names into a list of names
df2['Owner'] = df2['Owner'].apply(lambda x: x.split(";"))

# get a unique list of customers
unique_owners = {single_owner for owners_list in df2['Owner'] for single_owner in owners_list}
# Gives a set -> {'abel', 'john', 'julia', 'marry'}

# for each customer, slice the dataframe for each customer
df2[['marry' in row for row in df2['Owner']]]

# select only the books, not the names
df2[['marry' in row for row in df2['Owner']]]['Book']

# convert the books to a list. Alternative - ",".join(df2[['marry' in row for row in df2['Owner']]]['Book']) turns all the books into a single piece of text.
df2[['marry' in row for row in df2['Owner']]]['Book'].to_list()

# set up data storage
names = []
books = []

# iterate through he unique owners set
[(names.append(single_owner), books.append(df2[[single_owner in row for row in df2['Owner']]]['Book'].to_list())) for single_owner in unique_owners]

new_df2 = pd.DataFrame({'Owner':names,'Books':books})
new_df2

Answered By: ciaran haines
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.