Split string column based on delimiter and convert it to dict in Pandas without loop

Question

I have below dataframe

clm1, clm2, clm3
10, a, clm4=1|clm5=5
11, b, clm4=2

My desired result is

clm1, clm2, clm4, clm5
10, a, 1, 5
11, b, 2, Nan

I have tried below method

rows = list(df.index)    

dictlist = []

    for index in rows: #loop through each row to convert clm3 to dict
        i = df.at[index, "clm3"]        

        mydict = dict(map(lambda x: x.split('='), [x for x in i.split('|') if '=' in x]))
        dictlist.append(mydict)


l=json_normalize(dictlist) #convert dict column to flat dataframe

resultdf = example.join(l).drop('clm3',axis=1)

This is giving me desired result but I am looking for a more efficient way to convert clm3 to dict which does not involve looping through each row.

Asked By: user0204

||

Source

Answer 1

Using str.extractall to get your values and unstack to pivot them to a column for each unique value.

And str.get_dummies to get a column for each unique clm.

values = (
    df['clm3'].str.extractall('(=d)')[0]
              .str.replace('=', '')
              .unstack()
              .rename_axis(None, axis=1)
)

columns = df['clm3'].str.replace('=d', '').str.get_dummies(sep='|').columns
values.columns = columns
dfnew = pd.concat([df[['clm1', 'clm2']], values], axis=1)

   clm1 clm2  0    1
0    10    a  1    5
1    11    b  2  NaN

Answered By: Erfan

Answer 2

two steps :

idea is to create a double split and then group by the index and unstack the values as columns

s = (
    df["clm3"]
    .str.split("|", expand=True)
    .stack()
    .str.split("=", expand=True)
    .reset_index(level=1, drop=True)
)

final = pd.concat([df, s.groupby([s.index, s[0]])[1].sum().unstack()], axis=1).drop(
    "clm3", axis=1
)

print(final)
   clm1 clm2  clm4 clm5
0    10    a     1    5
1    11    b     2  NaN

Answered By: Umar.H

Answer 3

df11=df1.clm3.map(lambda x:"dict({})".format(x.replace('|',',')))
    .map(eval).map(pd.Series).pipe(lambda ss:pd.concat(ss.tolist(),axis=1)).T
df1.drop("clm3",axis=1).join(df11)

out：

 clm1 clm2  clm4  clm5
0    10    a   1.0   5.0
1    11    b   2.0   NaN

Answered By: G.G

Split string column based on delimiter and convert it to dict in Pandas without loop

Question:

Answers: