Pandas: split column of lists of unequal length into multiple columns
Question:
I have a Pandas dataframe that looks like the below:
codes
1 [71020]
2 [77085]
3 [36415]
4 [99213, 99287]
5 [99233, 99233, 99233]
I’m trying to split the lists in df['codes']
into columns, like the below:
code_1 code_2 code_3
1 71020
2 77085
3 36415
4 99213 99287
5 99233 99233 99233
where columns that don’t have a value (because the list was not that long) are filled with blanks or NaNs or something.
I’ve seen answers like this one and others similar to it, and while they work on lists of equal length, they all throw errors when I try to use the methods on lists of unequal length. Is there a good way do to this?
Answers:
Try:
pd.DataFrame(df.codes.values.tolist()).add_prefix('code_')
code_0 code_1 code_2
0 71020 NaN NaN
1 77085 NaN NaN
2 36415 NaN NaN
3 99213 99287.0 NaN
4 99233 99233.0 99233.0
Include the index
pd.DataFrame(df.codes.values.tolist(), df.index).add_prefix('code_')
code_0 code_1 code_2
1 71020 NaN NaN
2 77085 NaN NaN
3 36415 NaN NaN
4 99213 99287.0 NaN
5 99233 99233.0 99233.0
We can nail down all the formatting with this:
f = lambda x: 'code_{}'.format(x + 1)
pd.DataFrame(
df.codes.values.tolist(),
df.index, dtype=object
).fillna('').rename(columns=f)
code_1 code_2 code_3
1 71020
2 77085
3 36415
4 99213 99287
5 99233 99233 99233
Another solution:
In [95]: df.codes.apply(pd.Series).add_prefix('code_')
Out[95]:
code_0 code_1 code_2
1 71020.0 NaN NaN
2 77085.0 NaN NaN
3 36415.0 NaN NaN
4 99213.0 99287.0 NaN
5 99233.0 99233.0 99233.0
I have created a function using @piRSquared solution that unrolls a dataframe from 3d (row x column x list of values) to 2d (row x column_{n})
def unroll_dataframe_columns_of_lists_to_columns(df):
new_df = pd.DataFrame()
for col in df.columns:
new_df = pd.concat([new_df, pd.DataFrame(df[col].values.tolist()).add_prefix(col + '_')], axis=1)
new_df.index = df.index
return new_df
I have a Pandas dataframe that looks like the below:
codes
1 [71020]
2 [77085]
3 [36415]
4 [99213, 99287]
5 [99233, 99233, 99233]
I’m trying to split the lists in df['codes']
into columns, like the below:
code_1 code_2 code_3
1 71020
2 77085
3 36415
4 99213 99287
5 99233 99233 99233
where columns that don’t have a value (because the list was not that long) are filled with blanks or NaNs or something.
I’ve seen answers like this one and others similar to it, and while they work on lists of equal length, they all throw errors when I try to use the methods on lists of unequal length. Is there a good way do to this?
Try:
pd.DataFrame(df.codes.values.tolist()).add_prefix('code_')
code_0 code_1 code_2
0 71020 NaN NaN
1 77085 NaN NaN
2 36415 NaN NaN
3 99213 99287.0 NaN
4 99233 99233.0 99233.0
Include the index
pd.DataFrame(df.codes.values.tolist(), df.index).add_prefix('code_')
code_0 code_1 code_2
1 71020 NaN NaN
2 77085 NaN NaN
3 36415 NaN NaN
4 99213 99287.0 NaN
5 99233 99233.0 99233.0
We can nail down all the formatting with this:
f = lambda x: 'code_{}'.format(x + 1)
pd.DataFrame(
df.codes.values.tolist(),
df.index, dtype=object
).fillna('').rename(columns=f)
code_1 code_2 code_3
1 71020
2 77085
3 36415
4 99213 99287
5 99233 99233 99233
Another solution:
In [95]: df.codes.apply(pd.Series).add_prefix('code_')
Out[95]:
code_0 code_1 code_2
1 71020.0 NaN NaN
2 77085.0 NaN NaN
3 36415.0 NaN NaN
4 99213.0 99287.0 NaN
5 99233.0 99233.0 99233.0
I have created a function using @piRSquared solution that unrolls a dataframe from 3d (row x column x list of values) to 2d (row x column_{n})
def unroll_dataframe_columns_of_lists_to_columns(df):
new_df = pd.DataFrame()
for col in df.columns:
new_df = pd.concat([new_df, pd.DataFrame(df[col].values.tolist()).add_prefix(col + '_')], axis=1)
new_df.index = df.index
return new_df