nan values when creating a pd.Series through a function

Question:

Assume we have the following dataframe

ap_comp = pd.DataFrame({'Name': ['Troll', 'Legolas'],'Code': [111, 222]})

and I passed it through the following function

a_mapping = pd.Series(apcompl['Code'], index=apcompl['Name']).to_dict()

my question is why the a_mapping returns as

{'Troll': nan, 'Legolas': nan}

why the nan appears? Should’ t it be the following

{'Troll': 111, 'Legolas': 222}
Asked By: Alex

||

Answers:

You have NaNs because of index alignment. The Series you pass in has an index that is different from the values passed as index, internally the constructor performs a reindexing, which gives NaN.

You would have needed to pass raw values (with .tolist(), .values or .to_numpy()) to the constructor, not a Series:

a_mapping = pd.Series(ap_comp['Code'].tolist(), index=ap_comp['Name']).to_dict()

But, much better, use:

a_mapping = ap_comp.set_index('Name')['Code'].to_dict()

output:

{'Troll': 111, 'Legolas': 222}
Answered By: mozway

As a complement to mozway’s answer (where he indicates the reason why OP is having the problem and presents one solution, and one alternative path), one can also do it as follows

a_mapping = dict(zip(ap_comp['Name'], ap_comp['Code']))

[Out]: 
{'Troll': 111, 'Legolas': 222}

I find that the combination between dict and zip (which are Python’s Built-in Functions) are quite convenient for this specific use cases (generating dictionaries from keys and values).

Answered By: Gonçalo Peres
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.