Pandas DataFrame: Merge rows and keep only unique values from multiple rows as a list

Question:

I´m looking to merge rows by some id but the problem is that I want to keep the unique values from multiple rows as a list while only keeping the duplicates from multiple rows as a single element.

The raw data would look like something like this:

| id | info1 | info2 | info3| info4|
| -- | ----  | ------| -----| -----|
| 1  | 'a'   |  xxx  |  yyy |      |
| 1  | 'b'   |  xxx  |  yyy |      |
| 2  | 'c'   |  mmm  |  nnn |      |
| 3  |  'd'  |  uuu  |      |      |
| 3  |  'e'  |  uuu  |  ooo |      |
| 4  |  'f'  |  xy   |      |      |
| 4  |  'g'  |  xy   |      |      |

(The blanks represent missing values)

The desired output data would look like:

| id | info1     | info2 | info3| info4 |
| -- | ----      | ------| -----| ------|
| 1  | ['a','b'] |  xxx  |  yyy |       |
| 2  | 'c'       |  mmm  |  nnn |       |
| 3  | ['d','e'] |  uuu  |  ooo |       |
| 4  | ['f','g'] |  xy   |      |       |

I´m quite new to this. Hopefully I expressed myself clear here.

Asked By: Yieh Yan

||

Answers:

Try this:

df.groupby('id').agg(pd.unique).applymap(lambda x: x[0] if len(x)==1 else x)
         info1 info2 info3
id                        
1   ['a', 'b']   xxx   yyy
2          'c'   mmm   nnn
3   ['d', 'e']   uuu   ooo
Answered By: Rabinzel
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.