Any way to transform nested list to dictionary in python?

Question:

I have sparsed dataframe that I needed to convert it to list which I already did it. Now I want to transform this list to dictionary, so I can do key-value comparison in my actual use case. To do so, I attempted to convert list to dictionary but I have value error instead. How can I do this correctly in python? Does anyone knows what is correct way to do this in python?

reproducible data and my attempt

mydict={'code0': {0: 'nan', 1: " '40'", 2: " '98'", 3: " '98'", 4: " '52'", 5: " '52'", 6: " '52'", 7: " '52'", 8: " '40'", 9: " '58'"}, 'code1': {0: "  ('VA','HC','NIH','SAP','AUS','HOL','ATT','COL','UCL')", 1: 'nan', 2: " ('ATT','NC')", 3: " ('ATT','VA','NC')", 4: " 'NC'", 5: " 'NC'", 6: " 'NC'", 7: " 'NC'", 8: " 'VA'", 9: " 'CE'"}, 'code2': {0: 'nan', 1: 'nan', 2: " ('103','104','105','106','31')", 3: " ('104','105','106','31')", 4: " '109'", 5: " '109'", 6: " '109'", 7: " '109'", 8: " '11'", 9: " ('109')"}, 'code3': {0: 'nan', 1: " '518'", 2: " '810'", 3: 'nan', 4: " ('610','620','682','642','621','611')", 5: " ('396','340','394','393','240')", 6: " ('612','790','110')", 7: " ('730','320','350','379','812','374')", 8: " ('113','174','131','115')", 9: " ('423','114')"}, 'code4': {0: 'nan', 1: 'nan', 2: " 'computer science'", 3: "  'computer science'", 4: " 'biology'", 5: " 'biology'", 6: "biology'", 7: "biology'", 8: 'nan', 9: 'nan'}, 'code5': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'nan', 4: 'nan', 5: " ('12','18')", 6: " ('12','16','18','19')", 7: " ('12','18','19')", 8: " ('11','19','31')", 9: " '31'"}, 'code6': {0: 'nan', 1: " '594'", 2: 'nan', 3: 'nan', 4: " ('712','479','297','639','452','172')", 5: 'nan', 6: " ('285','295','236','239','269','284','237')", 7: 'nan', 8: " ('164','157','388','158')", 9: " ('372','238')"}, 'rules_desc': {0: 'rules1', 1: 'rules2', 2: 'rules2', 3: 'rules2', 4: 'rules2', 5: 'rules2', 6: 'rules2', 7: 'rules2', 8: 'rules2', 9: 'rules2'}}

mydf=pd.DataFrame.from_dict(my_dict)
cols = mydf.columns.values
res=[",".join("{}:{}".format(*t) for t in zip(cols, row)) for _, row in mydf[cols].iterrows()]
res=[list(s.split('""')) for s in res]

intconv = lambda x: (x[0], int(x[1]))
for s in res:
    b = dict([i.split(':') for i in s])
    final = dict((k, int(v)) for k, v in b.items())
    print(final)

but I have value error as follow:

————————————————————————— ValueError Traceback (most recent call
last) Input In [60], in <cell line: 9>()
8 intconv = lambda x: (x[0], int(x[1]))
9 for s in res:
—> 10 b = dict([i.split(‘:’) for i in s])
11 final = dict((k, int(v)) for k, v in b.items())
12 print(final)

ValueError: dictionary update sequence element #0 has length 9; 2 is
required

not sure how to debug this, seems value can’t accept list of string. the reason why I am doing this because I want to do key-value comparison. Does anyone know any workaround on this?

desired output

I don’t want to try hard here. basically I want to create list of dictionary for each row of dataframe, where column_name is key and column value is value. I know in some column, its value is list of string, that’s why I don’t know how to create dictionary from it. here is the sample dictionary that I want:

output_dict={{'code0':'nan', 'code1': ('VA','HC','NIH','SAP','AUS','HOL','ATT','COL','UCL'), 'code2':'nan', 'code3':'nan', 'code4':'nan', 'code5':'nan', 'code6':'nan','rules_desc': rules1},
{'code0': 40, 'code1':'nan','code2':'nan','code3':518, 'code4':'nan', 'code5':'nan','code6':594, 'rules_desc': rules2},
...
}

can we get dictionary something like this? because I want to create such list or dictionary to filter out another big dataframe. Any help would be appreciated. thanks

Asked By: Hamilton

||

Answers:

Not sure I understood your question, but I think this will give you what you want – a list of dictionaries with key-value based on cols.

mydf.to_dict(orient='records') 

Let me know if I missed something

Answered By: lmaayanl

The data you have in mydf need cleaning. I’m using ast.literal_eval to try to convert the strings to tuples, and then int() the strings to integers:


from ast import literal_eval

out = []
for _, row in mydf.iterrows():
    tmp = {}
    for k, v in zip(row.index, row):
        try:
            tmp[k] = literal_eval(v)

            if isinstance(tmp[k], str) and tmp[k].isnumeric():
                tmp[k] = int(tmp[k])
        except (ValueError, SyntaxError):
            tmp[k] = v

    out.append(tmp)

print(out)

Prints:

[
    {
        "code0": "nan",
        "code1": ("VA", "HC", "NIH", "SAP", "AUS", "HOL", "ATT", "COL", "UCL"),
        "code2": "nan",
        "code3": "nan",
        "code4": "nan",
        "code5": "nan",
        "code6": "nan",
        "rules_desc": "rules1",
    },
    {
        "code0": 40,
        "code1": "nan",
        "code2": "nan",
        "code3": 518,
        "code4": "nan",
        "code5": "nan",
        "code6": 594,
        "rules_desc": "rules2",
    },
    {
        "code0": 98,
        "code1": ("ATT", "NC"),
        "code2": ("103", "104", "105", "106", "31"),
        "code3": 810,
        "code4": "computer science",
        "code5": "nan",
        "code6": "nan",
        "rules_desc": "rules2",
    },
    {
        "code0": 98,
        "code1": ("ATT", "VA", "NC"),
        "code2": ("104", "105", "106", "31"),
        "code3": "nan",
        "code4": "computer science",
        "code5": "nan",
        "code6": "nan",
        "rules_desc": "rules2",
    },
    {
        "code0": 52,
        "code1": "NC",
        "code2": 109,
        "code3": ("610", "620", "682", "642", "621", "611"),
        "code4": "biology",
        "code5": "nan",
        "code6": ("712", "479", "297", "639", "452", "172"),
        "rules_desc": "rules2",
    },
    {
        "code0": 52,
        "code1": "NC",
        "code2": 109,
        "code3": ("396", "340", "394", "393", "240"),
        "code4": "biology",
        "code5": ("12", "18"),
        "code6": "nan",
        "rules_desc": "rules2",
    },
    {
        "code0": 52,
        "code1": "NC",
        "code2": 109,
        "code3": ("612", "790", "110"),
        "code4": "biology'",
        "code5": ("12", "16", "18", "19"),
        "code6": ("285", "295", "236", "239", "269", "284", "237"),
        "rules_desc": "rules2",
    },
    {
        "code0": 52,
        "code1": "NC",
        "code2": 109,
        "code3": ("730", "320", "350", "379", "812", "374"),
        "code4": "biology'",
        "code5": ("12", "18", "19"),
        "code6": "nan",
        "rules_desc": "rules2",
    },
    {
        "code0": 40,
        "code1": "VA",
        "code2": 11,
        "code3": ("113", "174", "131", "115"),
        "code4": "nan",
        "code5": ("11", "19", "31"),
        "code6": ("164", "157", "388", "158"),
        "rules_desc": "rules2",
    },
    {
        "code0": 58,
        "code1": "CE",
        "code2": 109,
        "code3": ("423", "114"),
        "code4": "nan",
        "code5": 31,
        "code6": ("372", "238"),
        "rules_desc": "rules2",
    },
]
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.