why does the program return different value when I order the list differently?

Question

Im trying to learn how to analyze large data better, and I wanted to make a program where by inputting a CSV of keywords you can look for the occurance of each in a second data csv. I setup this code as an example, I created a list of keywords but when I switch the order of the first word the occruance it returns is incorrect. Forexample when "matlab" is first it returns 97 which is right. but when I put either of the other words first it returns 0. It doesn’t make sense to me because in my head it is ittreativng through the data set csv for every single word in the list, and checking. Could I get some help and clarafication.

Iv tried putting a print statement after first for loop and it is itterating through each word, confused as to why its not exceuting the later parts correctly.

import csv 
from pandas import *
import pandas as pd
from array import array
import csv
keywords = read_csv("Book1.csv")
with open('ss.csv','r') as csvfile: 
    reader = csv.DictReader(csvfile) 
    list=["matlab","souren","deez"]
    Attended=0
    no_show=0
    Registered=0
    word = 'matlab'
    nuts=[]


    for x in list:
        for row in reader:
        
            if x in row['one']:
                Registered=Registered+1
                

    print(Registered)

EDIT:

import csv 

import pandas as pd
from array import array
import csv
keywords = pd.read_csv("Book2.csv")
biomedical = keywords['Biomedical'].tolist()
Registered=0
counts = dict.fromkeys(biomedical, 0)


with open('ss.csv','r') as csvfile: 
    reader = csv.DictReader(csvfile) 
    lines=list(reader)
  
pd.set_option("display.max_rows", None)  
df = pd.read_csv('ss.csv')


ss=df.stack().value_counts()
            
print(ss)


##print(ss)
#for x in biomedical:
#    print(x)


<br>

solidworks, microsoft office, lua","Python, MATLAB, C++, HTML, CSS, Javascript, C,

<br>

SOLIDWORKS, Microsoft Office, LUA","I would like to further my experience in SOLIDWORKS,

<br>

my extra personal skills, and develop my coding skills.",Female,,Yes,No,Yes,No,"Mississauga, Canada",No,No,Terese Kattar,Student was approved to submit Fall 2022 application,Yes,,,10/31/2022,11/14/2022,,,,,,,
Mechanical Engineering,2022 - Fall,Application Accepted (Final Status),11/14/2022,BaoAnh Le,yes,10/31/2022,Lauren,Sena,Shirley,Dacanay,,,Melissa,[email protected],(647) 518-3977,,Mechanical Engineering,,,Mississauga,L5M 6N3,Female,Canadian,In Canada,N/A,false,Active,Uploaded,2.7,No,,No,,No,No,"Accommodation and food services, Administrative and support, waste management and remediation services, Construction, Financial services and insurance, Health care and social assistance, Information and cultural industries, Management of companies and enterprises, Manufacturing, Mining, quarrying, and oil and gas extraction, Other services (except public administration), Professional, scientific and technical services, Public administration, Real estate and rental and leasing, Retail Trade, Transportation and warehousing, Wholesale trade, Educational Services",Yes,Yes,"expert in c and javascript.

<br>

can proficiently use matlab.

<br>

skillful in microsoft word, office and excel.

<br>

knows the basic of vue and vuetify.

<br>

expert in html and css.

<br>

expert in google docs, slides, sheets.

<br>

skillful in cad software, such as: fusion 360 and solidworks..

<br>```

Asked By: codingnoob

||

Source

Answer 1

You are exhausting the reader upon completion of your inner for loop. You can simplify this as follows:

import csv

with open('ss.csv', newline='') as csvfile: 
    Registered=0

    for row in csv.DictReader(csvfile):
        if (one := row.get('one')) is not None:
            Registered += any(x in one for x in ("matlab", "souren", "deez"))
                
    print(Registered)

Answered By: Fred

Answer 2

Using Mustafa’s first suggestion, I did it using for loops however there is a much better way of doing it with pandas which I didn’t spend time to figure out.

import csv 
from pandas import *
import pandas as pd
from array import array
import csv
keywords = read_csv("input1.csv")
biomedical = keywords['Biomedical'].tolist()
Registered=0
counts = dict.fromkeys(biomedical, 0)


with open('ss2.csv','r') as csvfile: 
    reader = csv.DictReader(csvfile) 
    lines=list(reader)
  

for x in biomedical:
    for row in lines:
        for col in row:
            if x in row[col]:
                #print(row,col,row[col])
                counts[x]+=1
                #print(x)
                
with open('hh.csv', 'w') as f:
    for key in counts.keys():
        f.write("%s,%sn"%(key,counts[key]))
                

print(counts)
#for x in biomedical:
#    print(x)

Answered By: codingnoob

why does the program return different value when I order the list differently?

Question:

Answers: