Compare URLs from two different text files Python
Question:
I have a text file (links.txt) in the following format:
www.independent.co.uk www.bbc.co.uk www.theguardian.com www.telegraph.co.uk
www.dailymail.co.uk en.wikipedia.org www.huffingtonpost.co.uk www.bbc.co.uk
www.newsnow.co.uk www.express.co.uk
I have another text file (keys.txt) in the following format:
www.independent.co.uk www.bbc.co.uk www.theguardian.com
I want to compare both the text files and the URLs that are common in both the files has to be printed
I tried using the urltools package in python but couldn’t do it for multiple urls
Answers:
How about this:
links = open('links.txt', 'r')
links_data = links.read()
links.close()
keys = open('keys.txt', 'r')
keys_data = keys.read()
keys.close()
keys_split = keys_data.split()
for url in keys_split:
if url in links_data:
print(url)
Just make sure that links.txt
and keys.txt
are in the current working directory and everything should work fine. I’m assuming your URLs will always be space-delimited.
To print only unique URL instead common URL, just modify condition not in
, here is complete code –
links = open('links.txt', 'r')
links_data = links.read()
links.close()
keys = open('keys.txt', 'r')
keys_data = keys.read()
keys.close()
keys_split = keys_data.split()
for url in keys_split:
if url not in links_data:
print(url)
I have a text file (links.txt) in the following format:
www.independent.co.uk www.bbc.co.uk www.theguardian.com www.telegraph.co.uk
www.dailymail.co.uk en.wikipedia.org www.huffingtonpost.co.uk www.bbc.co.uk
www.newsnow.co.uk www.express.co.uk
I have another text file (keys.txt) in the following format:
www.independent.co.uk www.bbc.co.uk www.theguardian.com
I want to compare both the text files and the URLs that are common in both the files has to be printed
I tried using the urltools package in python but couldn’t do it for multiple urls
How about this:
links = open('links.txt', 'r')
links_data = links.read()
links.close()
keys = open('keys.txt', 'r')
keys_data = keys.read()
keys.close()
keys_split = keys_data.split()
for url in keys_split:
if url in links_data:
print(url)
Just make sure that links.txt
and keys.txt
are in the current working directory and everything should work fine. I’m assuming your URLs will always be space-delimited.
To print only unique URL instead common URL, just modify condition not in
, here is complete code –
links = open('links.txt', 'r')
links_data = links.read()
links.close()
keys = open('keys.txt', 'r')
keys_data = keys.read()
keys.close()
keys_split = keys_data.split()
for url in keys_split:
if url not in links_data:
print(url)