Using pandas .map to change values
Question:
I’m trying to change the strings in my data do numerical value using map function.
This is the data:
label sms_message
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
I’m trying to change ‘spam’ to 1 and ‘ham’ to 0 using this:
df['label'] = df.label.map({'ham':0, 'spam':1})
But the result is:
label sms_message
0 NaN Go until jurong point, crazy.. Available only ...
1 NaN Ok lar... Joking wif u oni...
2 NaN Free entry in 2 a wkly comp to win FA Cup fina...
3 NaN U dun say so early hor... U c already then say...
4 NaN Nah I don't think he goes to usf, he lives aro...
Do anyone can identify the problem?
Answers:
You are correct, I think you executed the same statement twice (1 after 1). The following statements executed on Python interactive terminal clarifies that.
Note: If you pass dictionary, map() replaces all values from Series with NaN
if it does not match with dictionary’s keys (I think, you have also done the same i.e. executing the statement twice). Check pandas map(), apply().
Pandas documentation note: when arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN.
>>> import pandas as pd
>>>
>>> d = {
... "label": ["ham", "ham", "spam", "ham", "ham"],
... "sms_messsage": [
... "Go until jurong point, crazy.. Available only ...",
... "Ok lar... Joking wif u oni...",
... "Free entry in 2 a wkly comp to win FA Cup fina...",
... "U dun say so early hor... U c already then say...",
... "Nah I don't think he goes to usf, he lives aro..."
... ]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
label sms_messsage
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
label sms_messsage
0 0 Go until jurong point, crazy.. Available only ...
1 0 Ok lar... Joking wif u oni...
2 1 Free entry in 2 a wkly comp to win FA Cup fina...
3 0 U dun say so early hor... U c already then say...
4 0 Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
label sms_messsage
0 NaN Go until jurong point, crazy.. Available only ...
1 NaN Ok lar... Joking wif u oni...
2 NaN Free entry in 2 a wkly comp to win FA Cup fina...
3 NaN U dun say so early hor... U c already then say...
4 NaN Nah I don't think he goes to usf, he lives aro...
>>>
Other ways to obtain the same result
>>> import pandas as pd
>>>
>>> d = {
... "label": ['spam', 'ham', 'ham', 'ham', 'spam'],
... "sms_message": ["M1", "M2", "M3", "M4", "M5"]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
label sms_message
0 spam M1
1 ham M2
2 ham M3
3 ham M4
4 spam M5
>>>
1st way – using map()
with dictionary
parameter
>>> new_values = {'spam': 1, 'ham': 0}
>>>
>>> df
label sms_message
0 spam M1
1 ham M2
2 ham M3
3 ham M4
4 spam M5
>>>
>>> df.label = df.label.map(new_values)
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
2nd way – using map()
with function
parameter
>>> df.label = df.label.map(lambda v: 0 if v == 'ham' else 1)
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
3rd way – using apply()
with function
parameter
>>> df.label = df.label.apply(lambda v: 0 if v == "ham" else 1)
>>>
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
Thank you.
Maybe your issue is with read_table function.
Try do it:
df = pd.read_table('smsspamcollection/SMSSpamCollection',
sep='t',
header=None,
names=['label', 'sms_message'])
may be because your data has not headers Solution:
try to do
df.columns = [ "label", "message"]
ham_spam= {'spam': 1, 'ham': 0}
df['label'] = df.label.map(ham_spam )
df
I’m trying to change the strings in my data do numerical value using map function.
This is the data:
label sms_message
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
I’m trying to change ‘spam’ to 1 and ‘ham’ to 0 using this:
df['label'] = df.label.map({'ham':0, 'spam':1})
But the result is:
label sms_message
0 NaN Go until jurong point, crazy.. Available only ...
1 NaN Ok lar... Joking wif u oni...
2 NaN Free entry in 2 a wkly comp to win FA Cup fina...
3 NaN U dun say so early hor... U c already then say...
4 NaN Nah I don't think he goes to usf, he lives aro...
Do anyone can identify the problem?
You are correct, I think you executed the same statement twice (1 after 1). The following statements executed on Python interactive terminal clarifies that.
Note: If you pass dictionary, map() replaces all values from Series with
NaN
if it does not match with dictionary’s keys (I think, you have also done the same i.e. executing the statement twice). Check pandas map(), apply().Pandas documentation note: when arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN.
>>> import pandas as pd
>>>
>>> d = {
... "label": ["ham", "ham", "spam", "ham", "ham"],
... "sms_messsage": [
... "Go until jurong point, crazy.. Available only ...",
... "Ok lar... Joking wif u oni...",
... "Free entry in 2 a wkly comp to win FA Cup fina...",
... "U dun say so early hor... U c already then say...",
... "Nah I don't think he goes to usf, he lives aro..."
... ]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
label sms_messsage
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
label sms_messsage
0 0 Go until jurong point, crazy.. Available only ...
1 0 Ok lar... Joking wif u oni...
2 1 Free entry in 2 a wkly comp to win FA Cup fina...
3 0 U dun say so early hor... U c already then say...
4 0 Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
label sms_messsage
0 NaN Go until jurong point, crazy.. Available only ...
1 NaN Ok lar... Joking wif u oni...
2 NaN Free entry in 2 a wkly comp to win FA Cup fina...
3 NaN U dun say so early hor... U c already then say...
4 NaN Nah I don't think he goes to usf, he lives aro...
>>>
Other ways to obtain the same result
>>> import pandas as pd
>>>
>>> d = {
... "label": ['spam', 'ham', 'ham', 'ham', 'spam'],
... "sms_message": ["M1", "M2", "M3", "M4", "M5"]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
label sms_message
0 spam M1
1 ham M2
2 ham M3
3 ham M4
4 spam M5
>>>
1st way – using
map()
withdictionary
parameter
>>> new_values = {'spam': 1, 'ham': 0}
>>>
>>> df
label sms_message
0 spam M1
1 ham M2
2 ham M3
3 ham M4
4 spam M5
>>>
>>> df.label = df.label.map(new_values)
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
2nd way – using
map()
withfunction
parameter
>>> df.label = df.label.map(lambda v: 0 if v == 'ham' else 1)
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
3rd way – using
apply()
withfunction
parameter
>>> df.label = df.label.apply(lambda v: 0 if v == "ham" else 1)
>>>
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
Thank you.
Maybe your issue is with read_table function.
Try do it:
df = pd.read_table('smsspamcollection/SMSSpamCollection',
sep='t',
header=None,
names=['label', 'sms_message'])
may be because your data has not headers Solution:
try to do
df.columns = [ "label", "message"]
ham_spam= {'spam': 1, 'ham': 0}
df['label'] = df.label.map(ham_spam )
df