Python: Extracting all data from dictionary columns and put them into columns
Question:
I have a table that includes a column that includes a dictionary. In this dictionary, there is a key, and a list of dictionary values as follow:
id
dict_vals
123
{‘key_a’:[{‘a’:1,’b’:8,’c’:7},{‘a’:14,’b’:6,’c’:8},{‘a’:9,’b’:4,’c’:9},…,{‘a_t’:67,’b_t’:41,’c_t’:6}]}
345
{‘key_a’:[{‘a’:5,’b’:82,’c’:72},{‘a’:4,’b’:64,’c’:81},{‘a’:5,’b’:3,’c’:6},…{‘a_t’:34,’b_t’:23,’c_t’:}]}
Inside the nested dictionary list of values, all the keys are the same, expect for the last dictionary (e.g., a_t, b_t…). What I am trying to do, is to drop the last dictionary and convert everything else to column as the following:
id
a
b
c
123
1
8
7
123
14
6
8
123
9
4
9
345
5
82
72
345
4
64
81
345
5
3
6
For the last dictionary (e.g., a_t, b_t), I have been able to separate them and convert them with the following code.
values = table.dict_vals.str.replace("'", '"').apply(json.loads).tolist()
df = pd.DataFrame(values)
df.key_a = df.key_a.apply(lambda x: x[-1])
data_split = df["dict_vals"].apply(pd.Series)
Unfortunately, I am not sure how this method can be used to separate all the other dictionaries. Any help is appreciated! Thank you!
Answers:
Try this:
- For removing the last element in
list
you can return list
with slicing like this [:-1]
.
- You can extract dataframe from each
dict
and use pd.merge
for merging all dataframes.
- You can use
functools.reduce
for merging multiple df
s.
- You can use
pandas.apply
with axis=1
and iterate each row and find the id from the original df
and add to created df
.
from functools import reduce
import ast
df_res = reduce(lambda x, y: pd.merge(x, y, how='outer'),
df.apply(lambda x: pd.DataFrame(ast.literal_eval(x['dict_vals'])['key_a'][:-1]).assign(id=df['id'][x.name]), axis=1)).set_index('id')
print(df_res)
Output:
a b c
id
123 1 8 7
123 14 6 8
123 9 4 9
345 5 82 72
345 4 64 81
345 5 3 6
Input DataFrame:
df = pd.DataFrame({
'id' : [123, 345],
'dict_vals' : [{'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9}, {'a_t':67,'b_t':41,'c_t':6}]},
{'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6}, {'a_t':34,'b_t':23,'c_t':9}]}]
})
First add the data
import pandas as pd
keys = [
{'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9},{'a_t':67,'b_t':41,'c_t':6}]},
{'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6}, {'a_t':34,'b_t':23,'c_t':2}]}
]
Then some vars. There should be 1 id for each key that is in the keys list
ids = [123, 345]
id, a, b, c, = [], [], [], []
First for loop is loop for setting the ids of the dicts. Then we get dict from the list as a var values Then it’s seperating the dict.
for idIndex in range(len(ids)):
values = keys[idIndex]["key_a"] # here is dict index of idIndex
for i in range(len(values) - 1): # here will drop last dict
value = values[i]
id.append(ids[idIndex]) # here your id
a.append(value["a"])
b.append(value["b"])
c.append(value["c"])
Finaly create data frame
df = pd.DataFrame({"id":id, "a": a, "b": b, "c": c}) # create data frame
print(df)
Output:
id a b c
0 123 1 8 7
1 123 14 6 8
2 123 9 4 9
3 345 5 82 72
4 345 4 64 81
5 345 5 3 6
I have a table that includes a column that includes a dictionary. In this dictionary, there is a key, and a list of dictionary values as follow:
id | dict_vals |
---|---|
123 | {‘key_a’:[{‘a’:1,’b’:8,’c’:7},{‘a’:14,’b’:6,’c’:8},{‘a’:9,’b’:4,’c’:9},…,{‘a_t’:67,’b_t’:41,’c_t’:6}]} |
345 | {‘key_a’:[{‘a’:5,’b’:82,’c’:72},{‘a’:4,’b’:64,’c’:81},{‘a’:5,’b’:3,’c’:6},…{‘a_t’:34,’b_t’:23,’c_t’:}]} |
Inside the nested dictionary list of values, all the keys are the same, expect for the last dictionary (e.g., a_t, b_t…). What I am trying to do, is to drop the last dictionary and convert everything else to column as the following:
id | a | b | c |
---|---|---|---|
123 | 1 | 8 | 7 |
123 | 14 | 6 | 8 |
123 | 9 | 4 | 9 |
345 | 5 | 82 | 72 |
345 | 4 | 64 | 81 |
345 | 5 | 3 | 6 |
For the last dictionary (e.g., a_t, b_t), I have been able to separate them and convert them with the following code.
values = table.dict_vals.str.replace("'", '"').apply(json.loads).tolist()
df = pd.DataFrame(values)
df.key_a = df.key_a.apply(lambda x: x[-1])
data_split = df["dict_vals"].apply(pd.Series)
Unfortunately, I am not sure how this method can be used to separate all the other dictionaries. Any help is appreciated! Thank you!
Try this:
- For removing the last element in
list
you can returnlist
with slicing like this[:-1]
. - You can extract dataframe from each
dict
and usepd.merge
for merging all dataframes. - You can use
functools.reduce
for merging multipledf
s. - You can use
pandas.apply
withaxis=1
and iterate each row and find the id from the originaldf
and add to createddf
.
from functools import reduce
import ast
df_res = reduce(lambda x, y: pd.merge(x, y, how='outer'),
df.apply(lambda x: pd.DataFrame(ast.literal_eval(x['dict_vals'])['key_a'][:-1]).assign(id=df['id'][x.name]), axis=1)).set_index('id')
print(df_res)
Output:
a b c
id
123 1 8 7
123 14 6 8
123 9 4 9
345 5 82 72
345 4 64 81
345 5 3 6
Input DataFrame:
df = pd.DataFrame({
'id' : [123, 345],
'dict_vals' : [{'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9}, {'a_t':67,'b_t':41,'c_t':6}]},
{'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6}, {'a_t':34,'b_t':23,'c_t':9}]}]
})
First add the data
import pandas as pd
keys = [
{'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9},{'a_t':67,'b_t':41,'c_t':6}]},
{'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6}, {'a_t':34,'b_t':23,'c_t':2}]}
]
Then some vars. There should be 1 id for each key that is in the keys list
ids = [123, 345]
id, a, b, c, = [], [], [], []
First for loop is loop for setting the ids of the dicts. Then we get dict from the list as a var values Then it’s seperating the dict.
for idIndex in range(len(ids)):
values = keys[idIndex]["key_a"] # here is dict index of idIndex
for i in range(len(values) - 1): # here will drop last dict
value = values[i]
id.append(ids[idIndex]) # here your id
a.append(value["a"])
b.append(value["b"])
c.append(value["c"])
Finaly create data frame
df = pd.DataFrame({"id":id, "a": a, "b": b, "c": c}) # create data frame
print(df)
Output:
id a b c
0 123 1 8 7
1 123 14 6 8
2 123 9 4 9
3 345 5 82 72
4 345 4 64 81
5 345 5 3 6