Pandas Dataframe: How to take the value from column A to join each value of a list in column B?
Question:
Let’s say, I have a Dataframe:
| ColumnA | Column B |
|----------|----------|
| prefix_1 | [A, B] |
| prefix_2 | [C, D] |
And I expect to get a new DataFrame like:
| ColumnA | Column B | Column C |
|----------|----------|--------------------------|
| prefix_1 | [A, B] | [prefix_1-A, prefix_1-B] |
| prefix_2 | [C, D] | [prefix_2-C, prefix_2-D] |
How do I need to do this transfer? Thank you.
I tried below but didn’t work:
df['ColumnC'] = df['ColumnB'].str.split(',').apply(lambda x: [df['ColumnA'] + '-' + e.strip() for e in x]).tolist()
Answers:
Use a list comprehension.
If you have lists in Column B:
df['Column C'] = [[f'{p}-{x}' for x in l] for p, l in
zip(df['ColumnA'], df['Column B'])]
If you have strings in Column B:
df['Column C'] = [[f'{p}-{x}' for x in l] for p, l in
zip(df['ColumnA'],
df['Column B'].str[1:-1].str.split(',s*')
)]
And if you want a string as output:
df['Column C'] = ['['+', '.join([f'{p}-{x}' for x in l])+']'
for p, l in
zip(df['ColumnA'],
df['Column B'].str[1:-1].str.split(',s*')
)]
Output:
ColumnA Column B Column C
0 prefix_1 [A, B] [prefix_1-A, prefix_1-B]
1 prefix_2 [C, D] [prefix_2-C, prefix_2-D]
Reproducible inputs:
# as string
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'],
'Column B': ['[A, B]', '[C, D]']})
# as lists
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'],
'Column B': [['A', 'B'], ['C', 'D']]})
You are right in using a lambda
function, but I would use it like this:
# Create the dataframe from your question
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'], 'Column B': [['A', 'B'], ['C', 'D']]})
# Create Column C accordingly
df['Column C'] = df.apply(lambda row: [row['ColumnA'] + '-' + elem for elem in row['Column B']], axis=1)
Make sure that you use axis=1
, so that the lambda funcion applies row-wise.
Let’s say, I have a Dataframe:
| ColumnA | Column B |
|----------|----------|
| prefix_1 | [A, B] |
| prefix_2 | [C, D] |
And I expect to get a new DataFrame like:
| ColumnA | Column B | Column C |
|----------|----------|--------------------------|
| prefix_1 | [A, B] | [prefix_1-A, prefix_1-B] |
| prefix_2 | [C, D] | [prefix_2-C, prefix_2-D] |
How do I need to do this transfer? Thank you.
I tried below but didn’t work:
df['ColumnC'] = df['ColumnB'].str.split(',').apply(lambda x: [df['ColumnA'] + '-' + e.strip() for e in x]).tolist()
Use a list comprehension.
If you have lists in Column B:
df['Column C'] = [[f'{p}-{x}' for x in l] for p, l in
zip(df['ColumnA'], df['Column B'])]
If you have strings in Column B:
df['Column C'] = [[f'{p}-{x}' for x in l] for p, l in
zip(df['ColumnA'],
df['Column B'].str[1:-1].str.split(',s*')
)]
And if you want a string as output:
df['Column C'] = ['['+', '.join([f'{p}-{x}' for x in l])+']'
for p, l in
zip(df['ColumnA'],
df['Column B'].str[1:-1].str.split(',s*')
)]
Output:
ColumnA Column B Column C
0 prefix_1 [A, B] [prefix_1-A, prefix_1-B]
1 prefix_2 [C, D] [prefix_2-C, prefix_2-D]
Reproducible inputs:
# as string
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'],
'Column B': ['[A, B]', '[C, D]']})
# as lists
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'],
'Column B': [['A', 'B'], ['C', 'D']]})
You are right in using a lambda
function, but I would use it like this:
# Create the dataframe from your question
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'], 'Column B': [['A', 'B'], ['C', 'D']]})
# Create Column C accordingly
df['Column C'] = df.apply(lambda row: [row['ColumnA'] + '-' + elem for elem in row['Column B']], axis=1)
Make sure that you use axis=1
, so that the lambda funcion applies row-wise.