Python – Link 2 columns

Question:

I want to create a data frame to link 2 columns together (customer ID to each order ID the customer placed). The row index + 1 correlates to the customer ID. Is there a way to do this through mapping?

Data: invoice_df

Order Id,Date,Meal Id,Company Id,Date of Meal,Participants,Meal Price,Type of Meal
839FKFW2LLX4LMBB,27-05-2016,INBUX904GIHI8YBD,LJKS5NK6788CYMUU,2016-05-31 07:00:00+02:00,['David Bishop'],469,Breakfast
97OX39BGVMHODLJM,27-09-2018,J0MMOOPP709DIDIE,LJKS5NK6788CYMUU,2018-10-01 20:00:00+02:00,['David Bishop'],22,Dinner
041ORQM5OIHTIU6L,24-08-2014,E4UJLQNCI16UX5CS,LJKS5NK6788CYMUU,2014-08-23 14:00:00+02:00,['Karen Stansell'],314,Lunch
YT796QI18WNGZ7ZJ,12-04-2014,C9SDFHF7553BE247,LJKS5NK6788CYMUU,2014-04-07 21:00:00+02:00,['Addie Patino'],438,Dinner
6YLROQT27B6HRF4E,28-07-2015,48EQXS6IHYNZDDZ5,LJKS5NK6788CYMUU,2015-07-27 14:00:00+02:00,['Addie Patino' 'Susan Guerrero'],690,Lunch
AT0R4DFYYAFOC88Q,21-07-2014,W48JPR1UYWJ18NC6,LJKS5NK6788CYMUU,2014-07-17 20:00:00+02:00,['David Bishop' 'Susan Guerrero' 'Karen Stansell'],181,Dinner
2DDN2LHS7G85GKPQ,29-04-2014,1MKLAKBOE3SP7YUL,LJKS5NK6788CYMUU,2014-04-30 21:00:00+02:00,['Susan Guerrero' 'David Bishop'],14,Dinner
FM608JK1N01BPUQN,08-05-2014,E8WJZ1FOSKZD2MJN,36MFTZOYMTAJP1RK,2014-05-07 09:00:00+02:00,['Amanda Knowles' 'Cheryl Feaster' 'Ginger Hoagland' 'Michael White'],320,Breakfast
CK331XXNIBQT81QL,23-05-2015,CTZSFFKQTY7SBZ4J,36MFTZOYMTAJP1RK,2015-05-18 13:00:00+02:00,['Cheryl Feaster' 'Amanda Knowles' 'Ginger Hoagland'],697,Lunch
FESGKOQN2OZZWXY3,10-01-2016,US0NQYNNHS1SQJ4S,36MFTZOYMTAJP1RK,2016-01-14 22:00:00+01:00,['Glenn Gould' 'Amanda Knowles' 'Ginger Hoagland' 'Michael White'],451,Dinner
YITOTLOF0MWZ0VYX,03-10-2016,RGYX8772307H78ON,36MFTZOYMTAJP1RK,2016-10-01 22:00:00+02:00,['Ginger Hoagland' 'Amanda Knowles' 'Michael White'],263,Dinner
8RIGCF74GUEQHQEE,23-07-2018,5XK0KTFTD6OAP9ZP,36MFTZOYMTAJP1RK,2018-07-27 08:00:00+02:00,['Amanda Knowles'],210,Breakfast
TH60C9D8TPYS7DGG,15-12-2016,KDSMP2VJ22HNEPYF,36MFTZOYMTAJP1RK,2016-12-13 08:00:00+01:00,['Cheryl Feaster' 'Bret Adams' 'Ginger Hoagland'],755,Breakfast
W1Y086SRAVUZU1AL,17-09-2017,8IUOYVS031QPROUG,36MFTZOYMTAJP1RK,2017-09-14 13:00:00+02:00,['Bret Adams'],469,Lunch
WKB58Q8BHLOFQAB5,31-08-2016,E2K2TQUMENXSI9RP,36MFTZOYMTAJP1RK,2016-09-03 14:00:00+02:00,['Michael White' 'Ginger Hoagland' 'Bret Adams'],502,Lunch
N8DOG58MW238BHA9,25-12-2018,KFR2TAYXZSVCHAA2,36MFTZOYMTAJP1RK,2018-12-20 12:00:00+01:00,['Ginger Hoagland' 'Cheryl Feaster' 'Glenn Gould' 'Bret Adams'],829,Lunch
DPDV9UGF0SUCYTGW,25-05-2017,6YV61SH7W9ECUZP0,36MFTZOYMTAJP1RK,2017-05-24 22:00:00+02:00,['Michael White'],708,Dinner
KNF3E3QTOQ22J269,20-06-2018,737T2U7604ABDFDF,36MFTZOYMTAJP1RK,2018-06-15 07:00:00+02:00,['Glenn Gould' 'Cheryl Feaster' 'Ginger Hoagland' 'Amanda Knowles'],475,Breakfast
LEED1HY47M8BR5VL,22-10-2017,I22P10IQQD06MO45,36MFTZOYMTAJP1RK,2017-10-22 14:00:00+02:00,['Glenn Gould'],27,Lunch
LSJPNJQLDTIRNWAL,27-01-2017,247IIVNN6CXGWINB,36MFTZOYMTAJP1RK,2017-01-23 13:00:00+01:00,['Amanda Knowles' 'Bret Adams'],672,Lunch
6UX5RMHJ1GK1F9YQ,24-08-2014,LL4AOPXDM8V5KP5S,H3JRC7XX7WJAD4ZO,2014-08-27 12:00:00+02:00,['Anthony Emerson' 'Irvin Gentry' 'Melba Inlow'],552,Lunch
5SYB15QEFWD1E4Q4,09-07-2017,KZI0VRU30GLSDYHA,H3JRC7XX7WJAD4ZO,2017-07-13 08:00:00+02:00,"['Anthony Emerson' 'Emma Steitz' 'Melba Inlow' 'Irvin Gentry'
 'Kelly Killebrew']",191,Breakfast
W5S8VZ61WJONS4EE,25-03-2017,XPSPBQF1YLIG26N1,H3JRC7XX7WJAD4ZO,2017-03-25 07:00:00+01:00,['Irvin Gentry' 'Kelly Killebrew'],471,Breakfast
795SVIJKO8KS3ZEL,05-01-2015,HHTLB8M9U0TGC7Z4,H3JRC7XX7WJAD4ZO,2015-01-06 22:00:00+01:00,['Emma Steitz'],588,Dinner
8070KEFYSSPWPCD0,05-08-2014,VZ2OL0LREO8V9RKF,H3JRC7XX7WJAD4ZO,2014-08-09 12:00:00+02:00,['Lewis Eyre'],98,Lunch
RUQOHROBGBOSNUO4,10-06-2016,R3LFUK1WFDODC1YF,H3JRC7XX7WJAD4ZO,2016-06-09 08:00:00+02:00,['Anthony Emerson' 'Kelly Killebrew' 'Lewis Eyre'],516,Breakfast
6P91QRADC2O9WOVT,25-09-2016,L2F2HEGB6Q141080,H3JRC7XX7WJAD4ZO,2016-09-26 07:00:00+02:00,"['Kelly Killebrew' 'Lewis Eyre' 'Irvin Gentry' 'Emma Steitz'
 'Anthony Emerson']",664,Breakfast

Code:

# Function to convert string ['name' 'name2'] to list ['name', 'name2']
# Returns a list of participant names
def string_to_list(participant_string): return re.findall(r"'(.*?)'", participant_string)
invoice_df["Participants"] = invoice_df["Participants"].apply(string_to_list)

# Obtain an array of all unique customer names
customers = invoice_df["Participants"].explode().unique()

# Create new customer dataframe
customers_df = pd.DataFrame(customers, columns = ["CustomerName"])

# Add customer id
customers_df["customer_id"] = customers_df.index + 1

# Create a first_name and last_name column
customers_df["first_name"] = customers_df["CustomerName"].apply(lambda x: x.split(" "[0])

# Splice the list 1: in the event the person has multiple last names
customers_df["last_name"] = customers_df["CustomerName"].apply(lambda x: x.split(" ")[1])
Asked By: c200402

||

Answers:

Solution

# Find all the occurrences of customer names
# then explode to convert values in lists to rows
cust = invoice_df['Participants'].str.findall(r"'(.*?)'").explode()

# Join with orderid 
customers_df = invoice_df[['Order Id']].join(cust)

# factorize to encode the unique values in participants
customers_df['Customer Id'] = customers_df['Participants'].factorize()[0] + 1

Result

            Order Id     Participants  Customer Id
0   839FKFW2LLX4LMBB     David Bishop            1
1   97OX39BGVMHODLJM     David Bishop            1
2   041ORQM5OIHTIU6L   Karen Stansell            2
3   YT796QI18WNGZ7ZJ     Addie Patino            3
4   6YLROQT27B6HRF4E     Addie Patino            3
4   6YLROQT27B6HRF4E   Susan Guerrero            4
5   AT0R4DFYYAFOC88Q     David Bishop            1
5   AT0R4DFYYAFOC88Q   Susan Guerrero            4
5   AT0R4DFYYAFOC88Q   Karen Stansell            2
6   2DDN2LHS7G85GKPQ   Susan Guerrero            4
6   2DDN2LHS7G85GKPQ     David Bishop            1
7   FM608JK1N01BPUQN   Amanda Knowles            5
7   FM608JK1N01BPUQN   Cheryl Feaster            6
7   FM608JK1N01BPUQN  Ginger Hoagland            7
7   FM608JK1N01BPUQN    Michael White            8
8   CK331XXNIBQT81QL   Cheryl Feaster            6
8   CK331XXNIBQT81QL   Amanda Knowles            5
8   CK331XXNIBQT81QL  Ginger Hoagland            7
9   FESGKOQN2OZZWXY3      Glenn Gould            9
9   FESGKOQN2OZZWXY3   Amanda Knowles            5
9   FESGKOQN2OZZWXY3  Ginger Hoagland            7
9   FESGKOQN2OZZWXY3    Michael White            8
10  YITOTLOF0MWZ0VYX  Ginger Hoagland            7
10  YITOTLOF0MWZ0VYX   Amanda Knowles            5
10  YITOTLOF0MWZ0VYX    Michael White            8
11  8RIGCF74GUEQHQEE   Amanda Knowles            5
12  TH60C9D8TPYS7DGG   Cheryl Feaster            6
12  TH60C9D8TPYS7DGG       Bret Adams           10
12  TH60C9D8TPYS7DGG  Ginger Hoagland            7
13  W1Y086SRAVUZU1AL       Bret Adams           10
14  WKB58Q8BHLOFQAB5    Michael White            8
14  WKB58Q8BHLOFQAB5  Ginger Hoagland            7
14  WKB58Q8BHLOFQAB5       Bret Adams           10
15  N8DOG58MW238BHA9  Ginger Hoagland            7
15  N8DOG58MW238BHA9   Cheryl Feaster            6
15  N8DOG58MW238BHA9      Glenn Gould            9
15  N8DOG58MW238BHA9       Bret Adams           10
16  DPDV9UGF0SUCYTGW    Michael White            8
17  KNF3E3QTOQ22J269      Glenn Gould            9
17  KNF3E3QTOQ22J269   Cheryl Feaster            6
17  KNF3E3QTOQ22J269  Ginger Hoagland            7
17  KNF3E3QTOQ22J269   Amanda Knowles            5
18  LEED1HY47M8BR5VL      Glenn Gould            9
19  LSJPNJQLDTIRNWAL   Amanda Knowles            5
19  LSJPNJQLDTIRNWAL       Bret Adams           10
20  6UX5RMHJ1GK1F9YQ  Anthony Emerson           11
20  6UX5RMHJ1GK1F9YQ     Irvin Gentry           12
20  6UX5RMHJ1GK1F9YQ      Melba Inlow           13
21  5SYB15QEFWD1E4Q4  Anthony Emerson           11
21  5SYB15QEFWD1E4Q4      Emma Steitz           14
21  5SYB15QEFWD1E4Q4      Melba Inlow           13
21  5SYB15QEFWD1E4Q4     Irvin Gentry           12
21  5SYB15QEFWD1E4Q4  Kelly Killebrew           15
22  W5S8VZ61WJONS4EE     Irvin Gentry           12
22  W5S8VZ61WJONS4EE  Kelly Killebrew           15
23  795SVIJKO8KS3ZEL      Emma Steitz           14
24  8070KEFYSSPWPCD0       Lewis Eyre           16
25  RUQOHROBGBOSNUO4  Anthony Emerson           11
25  RUQOHROBGBOSNUO4  Kelly Killebrew           15
25  RUQOHROBGBOSNUO4       Lewis Eyre           16
26  6P91QRADC2O9WOVT  Kelly Killebrew           15
26  6P91QRADC2O9WOVT       Lewis Eyre           16
26  6P91QRADC2O9WOVT     Irvin Gentry           12
26  6P91QRADC2O9WOVT      Emma Steitz           14
26  6P91QRADC2O9WOVT  Anthony Emerson           11
Answered By: Shubham Sharma