Find list values in a column to delete odd ones out with Openpyxl

Question

I have two workbooks and Im looking to grab both of their column A to compare the cell values to see if there is a discrepancy.

If the column A (in workbook1) != column A(in workbooks2) delete the value in workbook1.

Heres what I have so far

    book1_list = []
    book2_list = []
    tempList = []

    column_name = 'Numbers'

    skip_Head_of_anotherSheet = anotherSheet[2: anotherSheet.max_row]
    skip_Head_of_other = sheets[2: sheets.max_row]

    for val1 in skip_Head_of_other:
       book1_list.append(val1[0].value)


    for val2 in skip_Head_of_anotherSheet:
       book2_list.append(val2[0].value)

    for i in book1_list:
        for j in book2_list:
          if i == j:
            tempList.append(j)
             print(j)

Here is where I get stuck –

for temp in tempList:
    for pointValue in skip_Head_of_anotherSheet:
        if temp != pointValue[0].value:
            anotherSheet.cell(column=4, row =pointValue[1].row, value ="YES")
       # else:
            #if temp != pointValue[0].value:
                #anotherSheet.cell(column=4, row =pointValue[1].row, value ="YES")
            
           # anotherSheet.delete_rows(pointValue[0])
                #anotherSheet.delete_rows(row[0].row,1)

I also attempted to include to find the column by name:

for col in script.iter_cols():
      # see if the value of the first cell matches
      if col[0].value == column_value:
         # this is the column we want, this col is an iterable of cells:
         for cell in col:
            # do something with the cell in this column here

Asked By: PiAlx

||

Source

Answer 1

I’m not quite sure I understand what you want to do but the following might help. When you want to check for membership in Python use dictionaries and sets.

source = wb1["sheet"] 
comparison = wb2[sheet"]

# create dictionaries of the cells in the worksheets keyed by cell value
source_cells = {row[0].value:row[0] for row in source.iter_rows(min_row=2, max_col=1)}
comparison_cells = {row[0].value:row[0] for row in comparison.iter_rows(min_row=2, max_col=1)}

shared = source_cells & comparison_cells # create a set of values in both sheets
missing = comparison_cells - source_cells # create a set of values only the other sheet

for value in shared:
    cell = source_cells[value]
    cell.offset(column=3).value = "YES"

to_remove = [comparison_cells[value].row for value in missing] # list of rows to be removed
for r in reversed(to_remove): # always remove rows from the bottom first
    comparison.delete_rows(row=r)

You’ll probably need to adjust this to suit your needs but I hope it helps.

Answered By: Charlie Clark

Answer 2

A dictionary solved the issue:

I turned the tempList into a tempDict like so:

comp = dict.fromkeys(tempList)

So now it will return a dictionary.

I then instead of looping tempList I only looped the sheet.
Then in the if statement i checked if the value is in the directory.

for pointValue in skip_Head_of_anotherSheet:
    if pointValue[21].value in comp:
        #anotherSheet.cell(column=23, row=pointValue[21].row, value="YES")
        anotherSheet.delete_rows(pointValue[21].row,1)
    if pointValue[21].value not in comp:
        #anotherSheet.cell(column=23, row=pointValue[21].row, value="NO")

Answered By: PiAlx

Find list values in a column to delete odd ones out with Openpyxl

Question:

Answers: