How to print the duplicate file and the real file in python

Question:

I have already printed the file which is duplicate from a file directory. what i want is to print both the duplicate file and the corresponding real file from which it was duplicated.
below is my code.

path = "path/"
def duplicatecheck():
    DATA_DIR = Path(path)
    files = sorted(DATA_DIR.glob('*.xml'))
    
    invoice_number = {}
    duplicateFiles = []

    for i in range(0,len(files)):
        tree = ET.parse(files[i])
        root = tree.getroot()
        record = root.findall('record')

        for item in record:
            invoice = item.find('invoice_number').text
            if invoice in invoice_number:
                duplicateFiles.append(files[i])
                print("Duplicate file found: ", files[i])
                break
                
            else:
                invoice_number[invoice] = files[i]
                

duplicatecheck()

the below is my output:

Duplicate file found: file (1).xml

Duplicate file found: file (2).xml

Duplicate file found: file (3).xml

what i want to print is the duplicate file and the corresponding file from which it was found it was the duplicate

like below:

Duplicate file found: file (1).xml, file (a).xml

Duplicate file found: file (2).xml, file (a).xml

Duplicate file found: file (3).xml, file (a).xml

what i mean is if a file is found as duplicate i want to print both files

Asked By: Lukman

||

Answers:

if invoice in invoice_number ensures your dictionary has the item stored, so internally it looks something like this:

{
    'my_invoice_number': 'file.xml',
    'my_other_invoice_number': 'file2.xml',
}

So all you need to do is print it:

print("Duplicate file found: ", files[i], invoice_number[invoice])
Answered By: solarc
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.