Python – Find distinct domains inside a list of dictionaries

Question:

I have a list (with dictionaries inside) and I want to know how many different domains are inside it.

I have something like this:

list = [
    {'url': 'https://stackoverflow.com/questions', 'number': 10},
    {'url': 'https://stackoverflow.com/users', 'number': 40},
    {'url': 'https://stackexchange.com/tour', 'number': 40}, 
    {'url': 'https://stackexchange.com/whatever/whatever', 'number': 25}
] 

The desired result would look like this:

unique_domains = [
    {'url': 'https://stackoverflow.com'},
    {'url': 'https://stackexchange.com'}
]

Or maybe just:

unique_domains = ['stackoverflow.com', 'stackexchange.com']

Both would be OK, so whatever is easier or faster I guess.

I think I could use Regex for this, but maybe there are more pythonic and/or efficient ways to do this?

Thanks!

Asked By: migueltic

||

Answers:

You can use urllib.parse.urlparse (from standard library) together with set comprehension (to avoid duplicates):

from urllib.parse import urlparse

unique_domains = {urlparse(item['url']).netloc for item in given_list}

If you need, you can convert set to list via list(unique_domains). This is more reliable than regex solution.

(please don’t call variable list, it shadows useful builtin).

Answered By: SUTerliakov
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.