dict comprehension including list comprehension with if statement no producing the correct output

Question:

I have some lists of text and I need to pull them together into a dictionary, but I need to use one list to ‘filter’ the other. I can do this in a series of nested for loops but I can not make it work with a dict comprehension.

a = ['Complete Yes', 'Title Mr', 'Forename John', 'Initial A', 'Surname Smith', 'Date of Birth 01 01 1901']
b = ['Forename', 'Surname', 'Date of birth']

If I try to make a dict of the needed details with nested for loops it works fine

details = {}
for x in b:
    for l in a:
        if x in l:
            details[x] = l

details

I get

{'Forename': 'Forename John',
 'Surname': 'Surname Smith',
 'Date of birth': 'Date of birth 01 01 1901'}

which needs cleaning up but I can do that later.

When I try it with a dict comprehension

d_tails = {x:l for x,l in zip(b, [l for l in a if x in l]) }

I get

{'Forename': 'Date of birth 01 01 1901'}

I’m sure this is because of how I’m ordering the dict comprehension but I can’t figure out how to order it so that it replaces the for loop.

For context I’m trying to clean really messy data for terrible pdfs that where a comes from. Any help on this would be appreciated.

Asked By: KevOMalley743

||

Answers:

Let’s consider simpler examples of two lists:

a = [1,2,3]
b = ['a', 'b', 'c']
for x in a:
    for y in b:
        print(x, y)

This produces 9 lines of output, one for every possible combination of a value from a and a value from b.

for x, y in zip(a, b):
    print(x, y)

This produces only 3 lines of output: one for every corresponding pair of values taking one from a and one from b.

If you want to convert your nested loop into a single dict comprehension, you need two generators, not a single generator iterating over a zip object.

details = {x: l for x in b for l in a if x in l}
Answered By: chepner

If I want to convert such a nested loop into a comprehension, I typically start from the outside.

You already know how the details dict should look in the end, so I start with that structure, and insert a placeholder '' value:

details = {x: '' for x in b}

With list b out of the way, I can now only look at a given x (say Forename) and list a: I observe that in principle there could be multiple matching entries in that list, making it possible to retrieve a list of possibly matching name entries. That corresponds to a filtered list comprehension [l for l in a if x in b]. Combined:

details = {x: [l for l in a if x in b] for x in b}

But you wanted to have a string, and the most common case being just one match. For that, I recommend using ', '.join(...) to convert that list of matches back to a string. At the same time, the list comprehension becomes a generator:

details = {x: ', '.join(l for l in a if x in b) for x in b}
Answered By: ojdo