I get an error about wrong dictionary update sequence length when trying to read lines from txt file into a dictionary

Question:

I’m trying to loop through multiple lines and add that into a dictionary, then a dataframe.

I’ve had many attempts but no solution yet.

I have a txt file with multiple lines like this for example, and I’m trying to iterate through each line, add it to a dictionary and then append the dictionary to a dataframe.

So textfile for example would go from here:

ABC=123, DEF="456", 
ABC="789", DEF="101112"

I would like this be added to a dictionary like this (on the first loop, for the first line):

{ABC:123,DEF=456}

and then appended to a df like this

   ABC   DEF
 0 123   456
 1 789   101112

So far I have tried this, this only works for one line in the text file, when I add a new line, I get this error:

dictionary update sequence element #6 has length 3; 2 is required

with open("file.txt", "r") as f:
    s = f.read().strip()
    dictionary = dict(subString.split("=") for subString in s.split(","))
    dataframe = dataframe.append(dictionary, ignore_index=True)
dataframe
Asked By: Uche24

||

Answers:

One suggestion is to parse each line with regex, and then insert the matches (if found) into the dictionary. You can change the regex pattern as needed, but this one matches words on the left side of = with numbers on the right which start with ' or ".

import re
import pandas as pd

pattern = r'(w+)=['"]?(d+)'

str_dict = {}
with open('file.txt') as f:
for line in f:
    for key, val in re.findall(pattern, line):
        str_dict.setdefault(key, []).append(int(val))

df = pd.DataFrame(str_dict)

This is how I chose the regex pattern

Answered By: bn_ln

This also works in the scenario of a huge text file with many different strings:


    import re
    file= open('event.txt', 'r').readlines()
    
  
    for group in file:
        output1 = group.replace('Event time', 'Event_time')
        words = re.findall(r'".*?"', str(output1))
        for word in words:
            text = str(output1).replace(word, word.replace(" ", "_"))
        output2 = text.strip().split(' ')
        for section in output2:
            key,val = section.strip().split('=')
            data_dict[key.strip()] = val.strip()
        df = df.append(data_dict, ignore_index=True)
    df
        
Answered By: Uche24
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.