ValueError: cannot reindex from a duplicate axis in explode

Question:

I am trying to expand the dataframe from one row to multiple rows based on two columns.

Following is the snapshot of the actual data.

initial_row_index        Date        Product_ID               No_of_items
1                        2021-07-11  ['A13N', 'A4BE','5GH$']  [3,5,1]
2                        2021-07-12  ['A13N', 'X9HE','7H3T']  [7,2,4]
3                        2021-07-13  ['A4BE', 'X9HE']         [8,4]

I want the transferred output to be as follows

initial_row_index        Date        Product_ID       No_of_items
1                        2021-07-11  A13N             3
1                        2021-07-11  A4BE             5
1                        2021-07-11  5GH$             1
2                        2021-07-12  A13N             7
2                        2021-07-12  X9HE             2
2                        2021-07-12  7H3T             4
3                        2021-07-13  A4BE             8
3                        2021-07-13  X9HE             4

To get this result I tried the following

df.apply(pd.Series.explode)

I am getting the following error.

ValueError: cannot reindex from a duplicate axis

I have already verified that I don’t have any duplicate index in the dataframe. The length of the lists in both the column for each row have same no of elements. After looking into the error found few similar questions where the answers suggested to explode the df column by column as follows.

df.explode("Product_ID").reset_index().drop("index",1).explode("No_of_items").reset_index().drop("index",1) 

But the above solution is giving me 9 rows instead of 3 rows after exploding the 1st row.
How can I solve the value error or achieve the desired out put using any other method.

Asked By: Crystal

||

Answers:

Try this:

df.explode(['Product_ID', 'No_of_items'])
>   initial_row_index   Date    Product_ID  No_of_items
0   1   2021-07-11  A13N    3
0   1   2021-07-11  A4BE    5
0   1   2021-07-11  5GH$    1
1   2   2021-07-12  A13N    7
1   2   2021-07-12  X9HE    2
1   2   2021-07-12  7H3T    4
2   3   2021-07-13  A4BE    8
2   3   2021-07-13  X9HE    4
Answered By: bpfrd
import itertools
import pandas as pd
import numpy as np
from pandas import DataFrame as df




df = pd.DataFrame({'Date': ['2021-07-11', '2021-07-12', '2021-07-13'],
                   'Product_ID': [[
                       'A13N', 'A4BE', '5GH$'], ['A13N', 'X9HE', '7H3T'],
                       ['A4BE', 'X9HE']],
                   'No_of_items': [[3, 5, 1], [
                       7, 2, 4], [8, 4]]})

print(df.to_string())


"""
         Date          Product_ID No_of_items
0  2021-07-11  [A13N, A4BE, 5GH$]   [3, 5, 1]
1  2021-07-12  [A13N, X9HE, 7H3T]   [7, 2, 4]
2  2021-07-13        [A4BE, X9HE]      [8, 4]

"""

aa = df.set_index('Date').apply(pd.Series.explode).reset_index()
print(aa.to_string())

"""
        Date Product_ID No_of_items
0  2021-07-11       A13N           3
1  2021-07-11       A4BE           5
2  2021-07-11       5GH$           1
3  2021-07-12       A13N           7
4  2021-07-12       X9HE           2
5  2021-07-12       7H3T           4
6  2021-07-13       A4BE           8
7  2021-07-13       X9HE           4
"""
Answered By: Soudipta Dutta
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.