How to fix columns not having matching element counts when using explode in python
Question:
I’m attempting to separate the string contents within certain columns to separate rows. The code I’m using results in an error that states the columns do not have matching element counts. How can I fix this?
Code:
review_path = r'data/base_data'
review_files = glob.glob(review_path + "/test_data.csv")
review_df_list = []
for review_file in review_files:
df = pd.read_csv(review_file)
print(df.head())
df["business"] = (df["business"].str.extractall(r"(?:[s,]*)(.*?(?:Unspecified|employees|Self-employed))").groupby(level=0).agg(list))
df["name"] = df["name"].str.split(r"s*,s*")
print(df.explode(["name", "business"]))
outPutPath = Path('data/base_data/test_data.csv')
df.to_csv(outPutPath, index=False)
Error message:
Traceback (most recent call last):
File "x", line 384, in <module>
print(df.explode(["name", "business"]))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 8255, in explode
raise ValueError("columns must have matching element counts")
ValueError: columns must have matching element counts
Answers:
This is because you have items of different lengths in one of your name, business
series.
For instance, let’s look at business
, and assume that the content is:
[[1,2],
[1,2],
[1,2,3],
[1,2]]
The third row where you have an extra value (3) than "usual" will cause the error.
I’m attempting to separate the string contents within certain columns to separate rows. The code I’m using results in an error that states the columns do not have matching element counts. How can I fix this?
Code:
review_path = r'data/base_data'
review_files = glob.glob(review_path + "/test_data.csv")
review_df_list = []
for review_file in review_files:
df = pd.read_csv(review_file)
print(df.head())
df["business"] = (df["business"].str.extractall(r"(?:[s,]*)(.*?(?:Unspecified|employees|Self-employed))").groupby(level=0).agg(list))
df["name"] = df["name"].str.split(r"s*,s*")
print(df.explode(["name", "business"]))
outPutPath = Path('data/base_data/test_data.csv')
df.to_csv(outPutPath, index=False)
Error message:
Traceback (most recent call last):
File "x", line 384, in <module>
print(df.explode(["name", "business"]))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 8255, in explode
raise ValueError("columns must have matching element counts")
ValueError: columns must have matching element counts
This is because you have items of different lengths in one of your name, business
series.
For instance, let’s look at business
, and assume that the content is:
[[1,2],
[1,2],
[1,2,3],
[1,2]]
The third row where you have an extra value (3) than "usual" will cause the error.