Trying to check a list of list against multiple other lists

Question:

I have a 100,000 entry list of lists derived from a CSV that is derived from a firewall log. My vision is to end with a file that outputs the ports used between any two IPs, such as:
1.1.1.1. to 2.2.2.2 ports (25, 53, 80)
2.2.2.2 to 1.1.1.1 ports (443, 123)

So far I have been able to read the file into a list then create a list of source ips and a list of destination ips. I can then manually get the ports associated with the manually inputted IPs. However, there are 4 sources and 67 destination. I do not want to run this manually 268 times. My problem is I want to somehow iterate over the list of list checking the source and destination and then adding those ports. My idea is to do a for loop over the lst object, then looping over the source list and destination list, then collecting the ports. I’m not sure if this can be done and if I’m doing it correctly at all.

I know there are some formatting issues and better ways to do some of this, rather newish.

Sample of the log (will strip srcport= later as its not critical at this time to do so):

1.1.1.1,2.2.2.2,srcport=58084,dstport=161,proto=17,service=SNMP
5.5.5.5,2.2.2.2,srcport=58082,dstport=123,proto=17,service=NTP
1.1.1.1,3.3.3.3,srcport=59089,dstport=123,proto=17,service=NTP
6.6.6.6,3.3.3.3,srcport=41376,dstport=123,proto=17,service=NTP
1.1.1.1,4.4.4.4,srcport=53546,dstport=22,proto=6,app=SSH

    #! python3

import csv
#Read the file and covert the CSV to a usable list format
csv_filename = 'data-for-csv-reader_no_parentheses_v3.csv'
#need to add operation to strip " from lines
#open file and read into a list of lists:
with open(csv_filename) as f:
    reader =csv.reader(f)
    lst = list(reader)


#Extract all Source IPs:
srcips =[]
for item in lst:
    srcips.append(item[0])

#Deduplicate source IPS:
srciplist = [*set(srcips)]
print("The number of source IPs is " + str(len(srciplist)) + ".")

#Strip 'srcip= off of entry (no longer needed, pherhaps)
srciplist_stripped = [j.strip('srcip=') for j in srciplist]
srciplist_stripped.sort()
print(srciplist_stripped)

#Extract all destination IPs:
dstips =[]
for item in lst:
    dstips.append(item[1])

#Deduplicate destination IPs:
dstiplist = [*set(dstips)]
print("The number of destination IPs is " + str(len(dstiplist)) + ".")

#Strip 'dstip= off of entry (no longer needed, pherhaps)
dstiplist_stripped = [j.strip('dstip=') for j in dstiplist]
dstiplist_stripped.sort()
print(dstiplist_stripped)

#Manual operation to get one source and one destination's ports:
port_list = []
for item in lst:
    if item[0] == srciplist_stripped[2] and item[1] == dstiplist_stripped[4]:
        port_list.append(item[3])

#Presents port list for the prior two IPs
port_list = [*set(port_list)]
print("Source IP:" + str(srciplist_stripped[2]) + " Destination IP:" + str(dstiplist_stripped[4]) + " Port_list :" + str(port_list))
print("The number of ports is " + str(len(port_list)) + ".")

The code won’t work unless you run it against a csv file. As written it gets me the following (edited for IPs):

The number of source IPs is 4.
['1.1.1.1', '2.2.2.2', '3.3.3.3', '4.4.4.4']
The number of destination IPs is 67.
['7.7.7.7', '6.6.6.6', '5.5.5.5', <--omitted for brevity-->]
Source IP:1.1.1.1. Destination IP:2.2.2.2 Port_list :['dstport=644', 'dstport=1039',<--omitted for brevity-->]
The number of ports is 873.

(IPs are faked so they don’t line up with the indexes as presented in the sample firewall log)

I want it to output this:

Source IP:1.1.1.1. Destination IP:2.2.2.2 Port_list :['dstport=644', 'dstport=1039',<--omitted for brevity-->]
    The number of ports is 873.

but for each ip address combination, which will then be written to a file. Final output would be what is posted above between 4 sources and 67 destinations, so 268 entries (many of which will be blank in the Port_list and list 0 for number of ports).

Final code:

#!python
import csv
from itertools import product
import os
import fileinput

#Specify the source and destination variables, the FW log file, final output file.
src, dst, = {}, set()
log_file = input("Enter Log File(be sure to include ".csv:")")
output_file = input("Enter the location and file name to send output to:")

#Temp file for data manipulation.
temp_output_file = "temp_output_file.txt"

#Create data structure
with open(log_file, "r") as f_in:
    reader = csv.reader(f_in)
    for row in reader:
        src.setdefault(row[0], {}).setdefault(row[1], {}).setdefault(row[3], {}).setdefault(row[4])
        dst.add(row[1])

#Print to screen if desired
#for s, d in product(src, dst):
    #print(f"Source IP: {s} Destination IP: {d} Port_list: {src[s].get(d, [])}")
    #print('n') 
Asked By: Robert Kraft

||

Answers:

I hope I’ve understood your question right.

You can load the source IPs to a dictionary (where keys are source IPs, values are dictionaries in format {destination IP: [list of ports]}).

import csv
from itertools import product

src, dst = {}, set()

with open("data.csv", "r") as f_in:
    reader = csv.reader(f_in)
    for row in reader:
        src.setdefault(row[0], {}).setdefault(row[1], []).append(
            row[3].split("=")[-1]
        )
        dst.add(row[1])

for s, d in product(src, dst):
    print(f"Source IP: {s} Destination IP: {d} Port_list: {src[s].get(d, [])}")

Prints:

Source IP: 1.1.1.1 Destination IP: 3.3.3.3 Port_list: ['123']
Source IP: 1.1.1.1 Destination IP: 4.4.4.4 Port_list: ['22']
Source IP: 1.1.1.1 Destination IP: 2.2.2.2 Port_list: ['161', '123']
Source IP: 5.5.5.5 Destination IP: 3.3.3.3 Port_list: []
Source IP: 5.5.5.5 Destination IP: 4.4.4.4 Port_list: []
Source IP: 5.5.5.5 Destination IP: 2.2.2.2 Port_list: ['123']
Source IP: 6.6.6.6 Destination IP: 3.3.3.3 Port_list: ['123']
Source IP: 6.6.6.6 Destination IP: 4.4.4.4 Port_list: []
Source IP: 6.6.6.6 Destination IP: 2.2.2.2 Port_list: []

Data used in data.csv:

1.1.1.1,2.2.2.2,srcport=58084,dstport=161,proto=17,service=SNMP
1.1.1.1,2.2.2.2,srcport=58084,dstport=123,proto=17,service=SNMP
5.5.5.5,2.2.2.2,srcport=58082,dstport=123,proto=17,service=NTP
1.1.1.1,3.3.3.3,srcport=59089,dstport=123,proto=17,service=NTP
6.6.6.6,3.3.3.3,srcport=41376,dstport=123,proto=17,service=NTP
1.1.1.1,4.4.4.4,srcport=53546,dstport=22,proto=6,app=SSH
Answered By: Andrej Kesely

Since you are already using the CSV module, you can use a DictReader to use column names, which is more flexible than indexes. And then apply set() on list comprehensions to count unique values. Example:

import csv

fieldnames = ['src_ip', 'dst_ip', 'src_port', 'dst_port', 'proto', 'service']

with open('firewall.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',', fieldnames=fieldnames)
    lines = list(reader)

unique_src_ips = set([line["src_ip"] for line in lines])
print(f"The number of source IPs is {len(unique_src_ips)}.")

unique_dst_ips = set([line["dst_ip"] for line in lines])
print(f"The number of destination IPs is {len(unique_dst_ips)}.")
Answered By: Kate
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.