Pandas Dataframe: How to take the value from column A to join each value of a list in column B?

Question:

Let’s say, I have a Dataframe:

| ColumnA  | Column B |
|----------|----------|
| prefix_1 | [A, B]   |
| prefix_2 | [C, D]   |

And I expect to get a new DataFrame like:

| ColumnA  | Column B | Column C                 |
|----------|----------|--------------------------|
| prefix_1 | [A, B]   | [prefix_1-A, prefix_1-B] |
| prefix_2 | [C, D]   | [prefix_2-C, prefix_2-D] |

How do I need to do this transfer? Thank you.

I tried below but didn’t work:

df['ColumnC'] = df['ColumnB'].str.split(',').apply(lambda x: [df['ColumnA'] + '-' + e.strip() for e in x]).tolist()
Asked By: nick_alice1993

||

Answers:

Use a list comprehension.

If you have lists in Column B:

df['Column C'] = [[f'{p}-{x}' for x in l] for p, l in
                   zip(df['ColumnA'], df['Column B'])]

If you have strings in Column B:

df['Column C'] = [[f'{p}-{x}' for x in l] for p, l in
                   zip(df['ColumnA'],
                       df['Column B'].str[1:-1].str.split(',s*')
                       )]

And if you want a string as output:

df['Column C'] = ['['+', '.join([f'{p}-{x}' for x in l])+']'
                  for p, l in
                   zip(df['ColumnA'],
                       df['Column B'].str[1:-1].str.split(',s*')
                       )]

Output:

    ColumnA Column B                  Column C
0  prefix_1   [A, B]  [prefix_1-A, prefix_1-B]
1  prefix_2   [C, D]  [prefix_2-C, prefix_2-D]

Reproducible inputs:

# as string
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'],
                   'Column B': ['[A, B]', '[C, D]']})

# as lists
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'],
                   'Column B': [['A', 'B'], ['C', 'D']]})
Answered By: mozway

You are right in using a lambda function, but I would use it like this:

# Create the dataframe from your question
df = pd.DataFrame({'ColumnA': ['prefix_1', 'prefix_2'], 'Column B': [['A', 'B'], ['C', 'D']]})

# Create Column C accordingly
df['Column C'] = df.apply(lambda row: [row['ColumnA'] + '-' + elem for elem in row['Column B']], axis=1)

Make sure that you use axis=1, so that the lambda funcion applies row-wise.

Answered By: Marioanzas
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.