remove rows having specific word in a loop

Question:

I have a file events.txt and there is multiple records having deleted events , how can we remove the delete events?

events.txt file records are like below –

delete|109393509715446004

{"id": 109472787571426436, "created_at": "2022-12-07T14:09:27+00:00", "in_reply_to_id": null, "in_reply_to_account_id": null, "sensitive": false}

{"id": 109472787901758948, "created_at": "2022-12-07T14:09:37+00:00", "in_reply_to_id": null, "in_reply_to_account_id": null, "sensitive": false}

delete|109393512606515336

{"id": 109472787957427984, "created_at": "2022-12-07T14:09:38+00:00","in_reply_to_id": null, "in_reply_to_account_id": null, "sensitive": false}

USed below approach to read the file data and transform :

with open('events.txt',encoding='utf-8') as f:
    for line in f:
        event = line.replace('update|', '').replace('status.update|', '').replace('status.','')
        print(type(event))
        print(event)

type of event - <class 'str'>

Please suggest how can we remove or skip the delete event rows while processing in loop above.

Asked By: deepu2711

||

Answers:

It looks like the lines in the file you care about are valid JSON, while the lines you want to ignore are not. If true, and assuming there is no possibility of a JSON decode error with your valid entries, then you could leverage that difference like this:

import json

with open("temp.txt") as file:
    for line in file:
        try:
            d = json.loads(line)
            print(d)
        except json.JSONDecodeError:
            pass

Output:

{'id': 109472787571426436, 'created_at': '2022-12-07T14:09:27+00:00', 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False}
{'id': 109472787901758948, 'created_at': '2022-12-07T14:09:37+00:00', 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False}
{'id': 109472787957427984, 'created_at': '2022-12-07T14:09:38+00:00', 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False}
Answered By: rhurwitz
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.