Explode multiple columns in CSV with varying/unmatching element counts using Pandas

Question:

I’m trying to use the explode function in pandas on 2 columns in a CSV that have varying element counts. I understand that one of the limitations of a multi-explode currently is that you can’t have nonmatching element counts in the target columns, so I’m wondering what you can do to get around this or if there’s something completely different besides explode?

Input:

Fruit Color Origin
Apple Red, Green USA; Canada
Plum Purple USA
Mango Red, Yellow Mexico; USA
Pepper Red, Green Mexico

Desired Output:

Fruit Color Origin
Apple Red USA
Apple Green Canada
Plum Purple USA
Mango Red Mexico
Mango Yellow USA
Pepper Red Mexico
Pepper Green Mexico

There is never more than 1 Origin value for rows with only 1 Color value.
Color values are always separated by ", " and Origin values are always separated by "; "

My code so far:

import pandas as pd
df = pd.read_csv('fruits.csv')
df['Color'] = df['Color'].str.split(', ')
df['Origin'] = df['Origin'].str.split('; ')
df = df.explode(['Color','Origin'])
df.to_csv('explode_fruit.csv', encoding='utf-8')

I get this error when running: "ValueError: columns must have matching element counts"

Asked By: KS1993

||

Answers:

The error is most likely due to the unequal number of values for colour and origin in the last row. As you have mentioned There is never more than 1 Origin value for rows with only 1 Color value. , you can try the following:

import pandas as pd
df = pd.DataFrame( {'Fruit':['Apple', 'Plum','Mango','Pepper'], 
                    'Color': ['Red, Green', 'Purple', 'Red, Yellow','Red, Green'], 
                    'Origin':['USA; Canada', 'USA', 'Mexico; USA', 'Mexico']
                })
df['Color'] = df['Color'].str.split(', ')
df['Origin'] = df['Origin'].str.split('; ')
# ensuring equal number of color and origin in each cell
df['Origin'] =df.apply(lambda x: x['Origin']* len(x['Color']) if len(x['Color'])>len(x['Origin']) else x['Origin'], axis=1)
df = df.explode(['Color','Origin']).reset_index(drop=True)
Answered By: LazyClown
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.