Import csv/xlsx file into python as a list and find a template match

Question:

I have two simplified csv files below (uploaded as images and link in google drive) and I would like to ask how can I import both files in a python list format where I can iterate over the "templates.csv" and find the best match (if no perfect match exist).

https://drive.google.com/drive/folders/1dOv-8fQgeGEsaPMXYU8kbo2mY4B2KkDH?usp=sharing

test.csv = enter image description here

templates.csv = enter image description here

My current issue is how can I import the csv files in such a way that it will be read as a list where:
test.csv file becomes:

test = [‘AC_CURRENT’, ‘AC_POWER’, ‘AC_POWER.MAX’, ‘DC_CURRENT’, ‘DC_VOLTAGE’, ….]

templates.csv file becomes a list of templates:

template1 = [‘AC_CURRENT’, ‘AC_CURRENT_15MIN’, ‘AC_CURRENT_A’, ‘AC_CURRENT_B’, ‘AC_CURRENT_C’,…]

template2 = [‘AC_CURRENT’, ‘AC_POWER’, ‘AC_POWER.MAX’, ‘DC_CURRENT’, ‘DC_VOLTAGE’, ….]

template3 = [‘AC_CURRENT’, ‘AC_POWER’, ‘AC_POWER.MAX’, ‘AC_VOLTAGE’, ‘DC_CURRENT’, ….]

templateN = ….

The first issue I have is reading both files as a dataframe where it will be converted as a list above then I can perform the iteration, finding a good match, and if no perfect match exists, it will give me a recommendation in percentage which template gives the most attributes match and will also list the missing attributes from my test.csv file that didnt have any match.

Take note that I cannot manipulate the excel file above in a easy format to pull as a list as we natively pull this from a software where these templates sits.

Thanks a lot for the help

I only have this line of codes for now which just reads the files as dataframe using pandas:

import pandas as pd

df1 = pd.read_csv('test.csv', sep=',')
df2 = pd.read_csv('templates.csv', sep=',')
print(df1)
print(df2)

enter image description here

Asked By: Nail. G

||

Answers:

Take this example:

import pandas as pd

df = pd.DataFrame({"Parent": ["x", "x", "y", "y", "z"], "Name": [1, 2, 3, 4, 5]})
>   Parent  Name
  0      x     1
  1      x     2
  2      y     3
  3      y     4
  4      z     5

# group by 'Parent', aggregate group values to lists for each column, then get column 'Name'
grouped_as_series = df.groupby('Parent').agg(list)['Name']
> Parent
  x    [1, 2]
  y    [3, 4]
  z       [5]

# to access single values
print(grouped_as_series["x"])
> [1, 2]

So for your case:

test = df1.groupyby('Parent').agg(list)['Name']['test']
series_of_templates = df2.groupyby('Parent').agg(list)['Name']

Note that I’m not sure if I understand what you plan to do with the data afterwards so a series might not be the best way to store the extracted data. E.g. a dictionary or list might be better.

Edit:

If I understand it correctly you want to then check the percentage of match between the test list and each template:

test = [6, 4]

best_template_percent = 0.0
best_template_index = -1
for ind, val in grouped_as_series.items():
    match_percent = len([v for v in test if v in val])/len(test)
    if match_percent > best_template_percent:
        best_template_percent = match_percent
        best_template_index = ind

print(best_template_index)
> y
print(grouped_as_series[best_template_index])
> [3, 4]

Or if you want it shorter:

import numpy as np

best_template_index = grouped_as_series.index[np.argmax([len([v for v in test if v in val])/len(test) for ind, val in grouped_as_series.items()])]

For your data:

import numpy as np

best_template_index = series_of_templates.index[np.argmax([len([v for v in test if v in val])/len(test) for ind, val in series_of_templates.items()])]
best_template_values = series_of_templates[best_template_index]
Answered By: ewz93

I have the below code to make the check

import pandas as pd

df1 = pd.read_csv('test.csv', sep=',')
df2 = pd.read_csv('templates.csv', sep=',')

test = df1.groupby('Parent').agg(list)['Name']['test']
series_of_templates = df2.groupby('Parent').agg(list)['Name']

#print(test)
#print(series_of_templates)

same_values = set(test) & set(series_of_templates['template1'])
print (same_values)
Answered By: Nail. G
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.