Create Dataframe from list of strings of delimited column names and values

Question:

I have a list of strings:

data = ['col1:abc col2:def col3:ghi',
        'col4:123 col2:qwe col10:xyz',
        'col3:asd']

I would like to convert this to a dataframe, where each string in the list is a row in the dataframe, like so:

desired_out = pd.DataFrame({'col1':  ['abc',  np.nan, np.nan],
                            'col2':  ['def',  'qwe',  np.nan],
                            'col3':  ['ghi',  np.nan, 'asd'],
                            'col4':  [np.nan, '123',  np.nan],
                            'col10': [np.nan, 'xyz',  np.nan]})

desired output

Asked By: Dean Power

||

Answers:

Use nested list comprehension with convert splitted values to dictionaries:

df = pd.DataFrame([dict([y.split(':') for y in x.split()]) for x in data])
print (df)
  col1 col2 col3 col4 col10
0  abc  def  ghi  NaN   NaN
1  NaN  qwe  NaN  123   xyz
2  NaN  NaN  asd  NaN   NaN
Answered By: jezrael