have two dictionary columns with different lengths want to use series explode but it mismatch

Question:

I have tow columns in my data frame that have multiple dictionarise in it and I want to expand it into multiple columns but the problem when I use explode series it mismatch

example:

Column A Column B Columns C
Cell 1 {"a":0.5} {"b":0.5} {"d1":0.25}{"d2":0.25}{"d3":0.25}{"d4":0.25}
Cell 2 {"c":1.0} {"t1":0.5} {"t2":0.5}

desirable output:

Column A Column B1 Columns B2 Columns C1 Columns C2
Cell 1 "a" 0.5 "d1" 0.25
Cell 1 "a" 0.5 "d2" 0.25
Cell 1 "b" 0.5 "d3" 0.25
Cell 1 "b" 0.5 "d4" 0.25
Cell 2 "c" 1.0 "t1" 0.5
Cell 2 "c" 1.0 "t2" 0.5

i used to use

df1 = ( 
       df.assign(Columns B2 =lambda df: df.Columns B.apply(lambda x: x.values()))    
     
       .reset_index()
       .apply(pd.Series.explode)
     
            )
output = ( 
       df1.assign(Columns C2 =lambda df: df.Columns C.apply(lambda x: x.values()))    
     
       .reset_index()
       .apply(pd.Series.explode)
     
            )

the problem is that it mismatches the rows something in column C1 supposed to be in row 2 go to 3

Asked By: SHAIMA ALJAHANI

||

Answers:

Assuming this input and that Column B has the shortest dictionaries:

df = pd.DataFrame({'Column A': ['Cell 1', 'Cell 2'],
                   'Column B': [[{'a': 0.5}, {'b': 0.5}], [{'c': 1.0}]],
                   'Column C': [[{'d1': 0.25}, {'d2': 0.25}, {'d3': 0.25}, {'d4': 0.25}], [{'t1': 0.5}, {'t2': 0.5}]]})

You can try to repeat the shortest list, then convert your key/value pairs as new DataFrames:

tmp = pd.DataFrame([[np.repeat(a, int(np.ceil(len(b)/len(a))))[:len(b)], b]
                    for a, b in zip(df['Column B'], df['Column C'])],
                   columns=['Column B', 'Column C']
                   ).explode(['Column B', 'Column C'])

def convert(col):
    s = col.apply(lambda x: list(x.items())[0])
    return (pd.DataFrame(s.tolist(), index=s.index)
              .rename(columns=lambda x: f'{col.name}{x+1}')
            )

out = (df.drop(columns=['Column B', 'Column C'])
         .join(pd.concat([convert(tmp[c]) for c in list(tmp)], axis=1))
      )

Output:

  Column A Column B1  Column B2 Column C1  Column C2
0   Cell 1         a        0.5        d1       0.25
0   Cell 1         a        0.5        d2       0.25
0   Cell 1         b        0.5        d3       0.25
0   Cell 1         b        0.5        d4       0.25
1   Cell 2         c        1.0        t1       0.50
1   Cell 2         c        1.0        t2       0.50
Answered By: mozway