Explode multiple uneven rows in Pandas
Question:
I have two columns, phones and emails, that need to be exploded into rows. I have figured out how to do this to either one, but not both simultaneously. The biggest problem is that I may have 0 to many phones and 0 to many emails. So, if a customer has three emails and no phones, then I need 3 rows. If they have four phones and three emails, then I need 4 rows. One for each phone, and the three emails in those four rows.
Example data:
| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1 | A,B,C | A,B |
| row 2 | | D,E,F |
Example Results:
| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1 | A | A |
| row 1 | B | B |
| row 1 | C | |
| row 2 | | D |
| row 2 | | E |
| row 2 | | F |
# Convert cell contents into lists rather than strings
df0['phones'] = df0['phones'].str.split(";", expand=False)
df0['emails'] = df0['emails'].str.split(",", expand=False)
df0 = df0.apply(pd.Series.explode) # DOES NOT WORK
When I try the above code, I get the error:
ValueError: cannot reindex on an axis with duplicate labels
Answers:
I assume the index on your original dataframe is unique. If not, run df = df.reset_index()
before the following snippet:
columns = ["phones", "emails"]
# Explode each column individually, but instead of using `explode`, we will
# use`stack` to give us a second index level
exploded = [
df[col].str.split(",", expand=True).stack().rename(col)
for col in columns
]
# Align the exploded columns
exploded = pd.concat(exploded, axis=1).droplevel(-1)
# Merge it with the original data frame
result = pd.concat([df.drop(columns=columns), exploded], axis=1)
I read this somewhere. If I find the link, I’ll upload it.
import itertools
import pandas as pd
import numpy as np
from pandas import DataFrame as df
df = pd.DataFrame({"x":[1,3,7],"y":["A","B","C"],
"z":["p1,p2,p3","p4","p5,p6"],"package_code":["111,222,333","444","555,666"]})
print(df)
"""
x y z package_code
0 1 A p1,p2,p3 111,222,333
1 3 B p4 444
2 7 C p5,p6 555,666
"""
aa = (
df.set_index(['x','y'])
.apply(lambda col : pd.Series(col).str.split(','))
.explode(['z','package_code'])
.reset_index()
.reindex(df.columns,axis=1)
)
print(aa)
"""
x y z package_code
0 1 A p1 111
1 1 A p2 222
2 1 A p3 333
3 3 B p4 444
4 7 C p5 555
5 7 C p6 666
"""
I have two columns, phones and emails, that need to be exploded into rows. I have figured out how to do this to either one, but not both simultaneously. The biggest problem is that I may have 0 to many phones and 0 to many emails. So, if a customer has three emails and no phones, then I need 3 rows. If they have four phones and three emails, then I need 4 rows. One for each phone, and the three emails in those four rows.
Example data:
| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1 | A,B,C | A,B |
| row 2 | | D,E,F |
Example Results:
| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1 | A | A |
| row 1 | B | B |
| row 1 | C | |
| row 2 | | D |
| row 2 | | E |
| row 2 | | F |
# Convert cell contents into lists rather than strings
df0['phones'] = df0['phones'].str.split(";", expand=False)
df0['emails'] = df0['emails'].str.split(",", expand=False)
df0 = df0.apply(pd.Series.explode) # DOES NOT WORK
When I try the above code, I get the error:
ValueError: cannot reindex on an axis with duplicate labels
I assume the index on your original dataframe is unique. If not, run df = df.reset_index()
before the following snippet:
columns = ["phones", "emails"]
# Explode each column individually, but instead of using `explode`, we will
# use`stack` to give us a second index level
exploded = [
df[col].str.split(",", expand=True).stack().rename(col)
for col in columns
]
# Align the exploded columns
exploded = pd.concat(exploded, axis=1).droplevel(-1)
# Merge it with the original data frame
result = pd.concat([df.drop(columns=columns), exploded], axis=1)
I read this somewhere. If I find the link, I’ll upload it.
import itertools
import pandas as pd
import numpy as np
from pandas import DataFrame as df
df = pd.DataFrame({"x":[1,3,7],"y":["A","B","C"],
"z":["p1,p2,p3","p4","p5,p6"],"package_code":["111,222,333","444","555,666"]})
print(df)
"""
x y z package_code
0 1 A p1,p2,p3 111,222,333
1 3 B p4 444
2 7 C p5,p6 555,666
"""
aa = (
df.set_index(['x','y'])
.apply(lambda col : pd.Series(col).str.split(','))
.explode(['z','package_code'])
.reset_index()
.reindex(df.columns,axis=1)
)
print(aa)
"""
x y z package_code
0 1 A p1 111
1 1 A p2 222
2 1 A p3 333
3 3 B p4 444
4 7 C p5 555
5 7 C p6 666
"""