Python: in dataframe, how to concatenate a column across rows (col. being list of lists)?

Question

I have a Dataframe as shown below as dataframe1:

dataframe1

ID	PATH
ABC	[[orange, apple, kiwi, peach], [strawberry, orange, kiwi, peach]]
ABC	[[apple, plum, peach], [apple, pear, peach]]
BCD	[[blueberry, plum, peach], [pear, apple, peach]]
BCD	[[plum, apple, peach], [banana, raspberry, peach]]

I would like to concatenate the value of column ‘PATH’ by column ‘ID’ and have the following results:

dataframe2 (ideal output, the output is the list of lists)

ID	PATH
ABC	[[orange, apple, kiwi, peach], [strawberry, orange, kiwi, peach], [apple, plum, peach], [apple, pear, peach]]
BCD	[[blueberry, plum, peach], [pear, apple, peach], [plum, apple, peach], [banana, raspberry, peach]]

I used the following code:

df2 = df1.groupby('id')['PATH'].apply(list)

but got the results with [[[ ]]] as shown which is list of list of lists…and is not what I want.

dataframe3 (wrong output)

ID	PATH
ABC	[[[orange, apple, kiwi, peach], [strawberry, orange, kiwi, peach], [apple, plum, peach], [apple, pear, peach]]]
BCD	[[[blueberry, plum, peach], [pear, apple, peach], [plum, apple, peach], [banana, raspberry, peach]]]

How can I get the results like in dataframe2?

Asked By: Mercury_Sun

||

Source

Answer 1

If im interpreting this properly you might want to look into the ‘merge’ function.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

Answered By: Chup91

Answer 2

Try:

df = df.groupby("ID")["PATH"].apply(lambda x: list(itertools.chain.from_iterable(x))).reset_index()

or

df = df.groupby("ID")["PATH"].apply(lambda x: [z for y in x for z in y]).reset_index()

Full working example:

data=[
  ["ABC", [["orange", "apple", "kiwi", "peach"], ["strawberry", "orange", "kiwi", "peach"]]],
  ["ABC", [["apple", "plum", "peach"], ["apple", "pear", "peach"]]],
  ["BCD", [["blueberry", "plum", "peach"], ["pear", "apple", "peach"]]],
  ["BCD", [["plum", "apple", "peach"], ["banana", "raspberry", "peach"]]],
]

columns = ["ID", "PATH"]

df = pd.DataFrame(data=data, columns=columns)

df = df.groupby("ID")["PATH"].apply(lambda x: list(itertools.chain.from_iterable(x))).reset_index()
#OR
df = df.groupby("ID")["PATH"].apply(lambda x: [z for y in x for z in y]).reset_index()

[Out]:
 ID                                                                                                           PATH
ABC   [[orange, apple, kiwi, peach], [strawberry, orange, kiwi, peach], [apple, plum, peach], [apple, pear, peach]]
BCD              [[blueberry, plum, peach], [pear, apple, peach], [plum, apple, peach], [banana, raspberry, peach]]

Answered By: Azhar Khan

Python: in dataframe, how to concatenate a column across rows (col. being list of lists)?

Question:

Answers: