Using pandas .map to change values

Question:

I’m trying to change the strings in my data do numerical value using map function.

This is the data:

    label   sms_message
0   ham     Go until jurong point, crazy.. Available only ...
1   ham     Ok lar... Joking wif u oni...
2   spam    Free entry in 2 a wkly comp to win FA Cup fina...
3   ham     U dun say so early hor... U c already then say...
4   ham     Nah I don't think he goes to usf, he lives aro...

I’m trying to change ‘spam’ to 1 and ‘ham’ to 0 using this:

df['label'] = df.label.map({'ham':0, 'spam':1})

But the result is:

    label   sms_message
0   NaN     Go until jurong point, crazy.. Available only ...
1   NaN     Ok lar... Joking wif u oni...
2   NaN     Free entry in 2 a wkly comp to win FA Cup fina...
3   NaN     U dun say so early hor... U c already then say...
4   NaN     Nah I don't think he goes to usf, he lives aro...

Do anyone can identify the problem?

Asked By: CAB

||

Answers:

You are correct, I think you executed the same statement twice (1 after 1). The following statements executed on Python interactive terminal clarifies that.

Note: If you pass dictionary, map() replaces all values from Series with NaN
if it does not match with dictionary’s keys (I think, you have also done the same i.e. executing the statement twice). Check pandas map(), apply().

Pandas documentation note: when arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN.

>>> import pandas as pd
>>>
>>> d = {
...     "label": ["ham", "ham", "spam", "ham", "ham"],
...     "sms_messsage": [
...     "Go until jurong point, crazy.. Available only ...",
...     "Ok lar... Joking wif u oni...",
...     "Free entry in 2 a wkly comp to win FA Cup fina...",
...     "U dun say so early hor... U c already then say...",
...     "Nah I don't think he goes to usf, he lives aro..."
...    ]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
  label                                       sms_messsage
0   ham  Go until jurong point, crazy.. Available only ...
1   ham                      Ok lar... Joking wif u oni...
2  spam  Free entry in 2 a wkly comp to win FA Cup fina...
3   ham  U dun say so early hor... U c already then say...
4   ham  Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
   label                                       sms_messsage
0      0  Go until jurong point, crazy.. Available only ...
1      0                      Ok lar... Joking wif u oni...
2      1  Free entry in 2 a wkly comp to win FA Cup fina...
3      0  U dun say so early hor... U c already then say...
4      0  Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
   label                                       sms_messsage
0    NaN  Go until jurong point, crazy.. Available only ...
1    NaN                      Ok lar... Joking wif u oni...
2    NaN  Free entry in 2 a wkly comp to win FA Cup fina...
3    NaN  U dun say so early hor... U c already then say...
4    NaN  Nah I don't think he goes to usf, he lives aro...
>>>

Other ways to obtain the same result

>>> import pandas as pd
>>>
>>> d = {
...     "label": ['spam', 'ham', 'ham', 'ham', 'spam'],
...     "sms_message": ["M1", "M2", "M3", "M4", "M5"]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
  label sms_message
0  spam          M1
1   ham          M2
2   ham          M3
3   ham          M4
4  spam          M5
>>>

1st way – using map() with dictionary parameter

>>> new_values = {'spam': 1, 'ham': 0}
>>>
>>> df
  label sms_message
0  spam          M1
1   ham          M2
2   ham          M3
3   ham          M4
4  spam          M5
>>>
>>> df.label = df.label.map(new_values)
>>> df
   label sms_message
0      1          M1
1      0          M2
2      0          M3
3      0          M4
4      1          M5
>>>

2nd way – using map() with function parameter

>>> df.label = df.label.map(lambda v: 0 if v == 'ham' else 1)
>>> df
   label sms_message
0      1          M1
1      0          M2
2      0          M3
3      0          M4
4      1          M5
>>>

3rd way – using apply() with function parameter

>>> df.label = df.label.apply(lambda v: 0 if v == "ham" else 1)
>>>
>>> df
   label sms_message
0      1          M1
1      0          M2
2      0          M3
3      0          M4
4      1          M5
>>>

Thank you.

Answered By: hygull

Maybe your issue is with read_table function.

Try do it:

df = pd.read_table('smsspamcollection/SMSSpamCollection',
                   sep='t', 
                   header=None,
                   names=['label', 'sms_message'])

may be because your data has not headers Solution:

try to do

df.columns = [ "label", "message"]
ham_spam= {'spam': 1, 'ham': 0}
df['label'] = df.label.map(ham_spam )
df
Answered By: Nour Moustafa
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.