How to get individual values from a string seperated by commas

Question:

I am reading a file using:

def readFile():
    file = open('Rules.txt', 'r')
    lines = file.readlines()
    for line in lines:
        rulesList.append(line)

rulesList:

['n', "Rule(F1, HTTPS TCP, ['ip', 'ip'], ['www.google.ca', '8.8.8.8'], 443)n", 'n', "Rule(F2, HTTPS TCP, ['ip', 'ip'], ['75.2.18.233'], 443)n", 'n']

My file looks like:

Rule(F1, HTTPS TCP, ['ip', 'ip'], ['www.google.ca', '8.8.8.8'], 443)

Rule(F2, HTTPS TCP, ['ip', 'ip'], ['ip'], 443)

I would like to feed the values to a class I created

class Rule:
    def __init__(self, flowNumber, protocol, port, fromIP=[], toIP=[]):
        self.flowNumber = flowNumber
        self.protocol = protocol
        self.port = port
        self.fromIP = fromIP
        self.toIP = toIP

    def __repr__(self):
        return f'nRule({self.flowNumber}, {self.protocol}, {self.fromIP}, {self.toIP}, {self.port})'

 newRule = Rule(currentFlowNum, currentProtocol, currentPort, currentFromIP, currentToIP)

to get an output such as:

[F1, HTTPS TCP, ['ip', 'ip'], ['www.google.ca', '8.8.8.8'], 443] 

or be able to assign these values to a variable like:

currentFlowNum = F1, currentProtocol = 'HTTPS TCP' , currentPort = 443, currentFromIP = ['ip', 'ip'], currentToIP = ['www.google.ca', '8.8.8.8']

I tried:

for rule in rulesList:
        if rule !='n':
            tmp = rule.split(',')
            print(tmp)

tmp:

['Rule(F1', ' HTTPS TCP', " ['ip'", " 'ip']", " ['www.google.ca'", " '8.8.8.8']", ' 443)n']
['Rule(F2', ' HTTPS TCP', " ['ip'", " 'ip']", " ['ip']", ' 443)n']

Is there a way to not split the commas between [] i.e. I would like the output to look like:

['Rule(F1', ' HTTPS TCP', " ['ip','ip']", " ['www.google.ca', '8.8.8.8']", ' 443)n']
['Rule(F2', ' HTTPS TCP', " ['ip','ip']", " ['ip']", ' 443)n']

Asked By: ritvik seth

||

Answers:

If you have control over how the data in the file is stored and can replace the single quotes (') with double quotes (") to make the "list" structures valid JSON, you could use RegExp for this.

A word of caution: unless you are absolutely sure that the format you’ll be reading will largely remain the same and is rather inflexible, you’re better off storing this data in a more well-established format (as mentioned in the comments) like JSON, YAML, etc. There are so many edge cases that could happen here that rolling your own parser like this objectively suboptimal.

import re
import json

def readFile():
    file = open('Rules.txt', 'r')
    myRules = []
    for line in file.readlines():
        match = re.match(r'Rule((?P<flow_number>[^,]+),s(?P<protocol>[^,]+),s(?P<from_ip>[[^]]+]),s(?P<to_ip>[[^]]+]),s(?P<port>[^,)]+))', line)
        if match:
          myRules.append(Rule(match.group('flow_number'), match.group('protocol'), match.group('port'), json.loads(match.group('from_ip')), json.loads(match.group('to_ip'))))

    return myRules


print(readFile())
# Returns:
# [
#  Rule(F1, HTTPS TCP, ['ip', 'ip'], ['www.google.ca', '8.8.8.8'], 443), 
#  Rule(F2, HTTPS TCP, ['ip', 'ip'], ['ip'], 443)]

Repl.it | Regex101

Answered By: esqew
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.