Pandas: split column of lists of unequal length into multiple columns

Question

I have a Pandas dataframe that looks like the below:

                   codes
1                  [71020]
2                  [77085]
3                  [36415]
4                  [99213, 99287]
5                  [99233, 99233, 99233]

I’m trying to split the lists in df['codes'] into columns, like the below:

                   code_1      code_2      code_3   
1                  71020
2                  77085
3                  36415
4                  99213       99287
5                  99233       99233       99233

where columns that don’t have a value (because the list was not that long) are filled with blanks or NaNs or something.

I’ve seen answers like this one and others similar to it, and while they work on lists of equal length, they all throw errors when I try to use the methods on lists of unequal length. Is there a good way do to this?

Asked By: user139014

||

Source

Answer 1

Try:

pd.DataFrame(df.codes.values.tolist()).add_prefix('code_')

   code_0   code_1   code_2
0   71020      NaN      NaN
1   77085      NaN      NaN
2   36415      NaN      NaN
3   99213  99287.0      NaN
4   99233  99233.0  99233.0

Include the index

pd.DataFrame(df.codes.values.tolist(), df.index).add_prefix('code_')

   code_0   code_1   code_2
1   71020      NaN      NaN
2   77085      NaN      NaN
3   36415      NaN      NaN
4   99213  99287.0      NaN
5   99233  99233.0  99233.0

We can nail down all the formatting with this:

f = lambda x: 'code_{}'.format(x + 1)
pd.DataFrame(
    df.codes.values.tolist(),
    df.index, dtype=object
).fillna('').rename(columns=f)

   code_1 code_2 code_3
1   71020              
2   77085              
3   36415              
4   99213  99287       
5   99233  99233  99233

Answered By: piRSquared

Answer 2

Another solution:

In [95]: df.codes.apply(pd.Series).add_prefix('code_')
Out[95]:
    code_0   code_1   code_2
1  71020.0      NaN      NaN
2  77085.0      NaN      NaN
3  36415.0      NaN      NaN
4  99213.0  99287.0      NaN
5  99233.0  99233.0  99233.0

Answered By: MaxU – stand with Ukraine

Answer 3

I have created a function using @piRSquared solution that unrolls a dataframe from 3d (row x column x list of values) to 2d (row x column_{n})

def unroll_dataframe_columns_of_lists_to_columns(df):
    new_df = pd.DataFrame()
    for col in df.columns:
        new_df = pd.concat([new_df, pd.DataFrame(df[col].values.tolist()).add_prefix(col + '_')], axis=1)
    new_df.index = df.index
    return new_df

Answered By: Youstanzr

Pandas: split column of lists of unequal length into multiple columns

Question:

Answers: