Python 'map' function inserting NaN, possible to return original values instead?
Question:
I am passing a dictionary to the map
function to recode values in the column of a Pandas dataframe. However, I noticed that if there is a value in the original series that is not explicitly in the dictionary, it gets recoded to NaN
. Here is a simple example:
Typing…
s = pd.Series(['one','two','three','four'])
…creates the series
0 one
1 two
2 three
3 four
dtype: object
But applying the map…
recodes = {'one':'A', 'two':'B', 'three':'C'}
s.map(recodes)
…returns the series
0 A
1 B
2 C
3 NaN
dtype: object
I would prefer that if any element in series s
is not in the recodes
dictionary, it remains unchanged. That is, I would prefer to return the series below (with the original four
instead of NaN
).
0 A
1 B
2 C
3 four
dtype: object
Is there an easy way to do this, for example an option to pass to the map
function? The challenge I am having is that I can’t always anticipate all possible values that will be in the series I’m recoding – the data will be updated in the future and new values could appear.
Thanks!
Answers:
Use replace
instead of map
:
>>> s = pd.Series(['one','two','three','four'])
>>> recodes = {'one':'A', 'two':'B', 'three':'C'}
>>> s.map(recodes)
0 A
1 B
2 C
3 NaN
dtype: object
>>> s.replace(recodes)
0 A
1 B
2 C
3 four
dtype: object
If you still want to use map the map function (can be faster than replace in some cases), you can define missing values:
class MyDict(dict):
def __missing__(self, key):
return key
s = pd.Series(['one', 'two', 'three', 'four'])
recodes = MyDict({
'one':'A',
'two':'B',
'three':'C'
})
s.map(recodes)
0 A
1 B
2 C
3 four
dtype: object
I am passing a dictionary to the map
function to recode values in the column of a Pandas dataframe. However, I noticed that if there is a value in the original series that is not explicitly in the dictionary, it gets recoded to NaN
. Here is a simple example:
Typing…
s = pd.Series(['one','two','three','four'])
…creates the series
0 one
1 two
2 three
3 four
dtype: object
But applying the map…
recodes = {'one':'A', 'two':'B', 'three':'C'}
s.map(recodes)
…returns the series
0 A
1 B
2 C
3 NaN
dtype: object
I would prefer that if any element in series s
is not in the recodes
dictionary, it remains unchanged. That is, I would prefer to return the series below (with the original four
instead of NaN
).
0 A
1 B
2 C
3 four
dtype: object
Is there an easy way to do this, for example an option to pass to the map
function? The challenge I am having is that I can’t always anticipate all possible values that will be in the series I’m recoding – the data will be updated in the future and new values could appear.
Thanks!
Use replace
instead of map
:
>>> s = pd.Series(['one','two','three','four'])
>>> recodes = {'one':'A', 'two':'B', 'three':'C'}
>>> s.map(recodes)
0 A
1 B
2 C
3 NaN
dtype: object
>>> s.replace(recodes)
0 A
1 B
2 C
3 four
dtype: object
If you still want to use map the map function (can be faster than replace in some cases), you can define missing values:
class MyDict(dict):
def __missing__(self, key):
return key
s = pd.Series(['one', 'two', 'three', 'four'])
recodes = MyDict({
'one':'A',
'two':'B',
'three':'C'
})
s.map(recodes)
0 A
1 B
2 C
3 four
dtype: object