Python: in dataframe, how to concatenate a column across rows (col. being list of lists)?

Question:

I have a Dataframe as shown below as dataframe1:

dataframe1

ID PATH
ABC [[orange, apple, kiwi, peach], [strawberry, orange, kiwi, peach]]
ABC [[apple, plum, peach], [apple, pear, peach]]
BCD [[blueberry, plum, peach], [pear, apple, peach]]
BCD [[plum, apple, peach], [banana, raspberry, peach]]

I would like to concatenate the value of column ‘PATH’ by column ‘ID’ and have the following results:

dataframe2 (ideal output, the output is the list of lists)

ID PATH
ABC [[orange, apple, kiwi, peach], [strawberry, orange, kiwi, peach], [apple, plum, peach], [apple, pear, peach]]
BCD [[blueberry, plum, peach], [pear, apple, peach], [plum, apple, peach], [banana, raspberry, peach]]

I used the following code:

df2 = df1.groupby('id')['PATH'].apply(list)

but got the results with [[[ ]]] as shown which is list of list of lists…and is not what I want.

dataframe3 (wrong output)

ID PATH
ABC [[[orange, apple, kiwi, peach], [strawberry, orange, kiwi, peach], [apple, plum, peach], [apple, pear, peach]]]
BCD [[[blueberry, plum, peach], [pear, apple, peach], [plum, apple, peach], [banana, raspberry, peach]]]

How can I get the results like in dataframe2?

Asked By: Mercury_Sun

||

Answers:

If im interpreting this properly you might want to look into the ‘merge’ function.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

Answered By: Chup91

Try:

df = df.groupby("ID")["PATH"].apply(lambda x: list(itertools.chain.from_iterable(x))).reset_index()

or

df = df.groupby("ID")["PATH"].apply(lambda x: [z for y in x for z in y]).reset_index()

Full working example:

data=[
  ["ABC", [["orange", "apple", "kiwi", "peach"], ["strawberry", "orange", "kiwi", "peach"]]],
  ["ABC", [["apple", "plum", "peach"], ["apple", "pear", "peach"]]],
  ["BCD", [["blueberry", "plum", "peach"], ["pear", "apple", "peach"]]],
  ["BCD", [["plum", "apple", "peach"], ["banana", "raspberry", "peach"]]],
]

columns = ["ID", "PATH"]

df = pd.DataFrame(data=data, columns=columns)

df = df.groupby("ID")["PATH"].apply(lambda x: list(itertools.chain.from_iterable(x))).reset_index()
#OR
df = df.groupby("ID")["PATH"].apply(lambda x: [z for y in x for z in y]).reset_index()

[Out]:
 ID                                                                                                           PATH
ABC   [[orange, apple, kiwi, peach], [strawberry, orange, kiwi, peach], [apple, plum, peach], [apple, pear, peach]]
BCD              [[blueberry, plum, peach], [pear, apple, peach], [plum, apple, peach], [banana, raspberry, peach]]
Answered By: Azhar Khan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.