How to use map to process column?
Question:
I have a pandas dataframe like this,
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
calories duration
0 420 50
1 380 40
2 390 45
And I have a function to alter the value,
def alter_val(val):
return val + 1
Now, as the documentation says, map() takes a function and iterator, it return another iterator. In my understanding, then it should work like this,
df["new_value"] = map(alter_val, df["calories"])
But it doesn’t work. Shows
TypeError: object of type 'map' has no len()
However, it works if I use the following code,
df["new"] = df["calories"].map(add_cal)
But it does not follow for documented approach map(function, series)
Can someone please take some time to explain the correct way, and why is it so?
Answers:
Convert the map output to list
df["new_value"] = list(map(alter_val, df["calories"]))
map
returns an iterator that yields results, not returns results, which means it’s results are not actually calculated until you explicitly "ask" for them. Try this:
list(map(alter_val, df["calories"]))
When you convert an iterator to a list, it has to calculate all of the results and store them in memory.
Despite that, I would stick to pandas .map()
method, as it appears to be cleaner in my opinion
I have a pandas dataframe like this,
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
calories duration
0 420 50
1 380 40
2 390 45
And I have a function to alter the value,
def alter_val(val):
return val + 1
Now, as the documentation says, map() takes a function and iterator, it return another iterator. In my understanding, then it should work like this,
df["new_value"] = map(alter_val, df["calories"])
But it doesn’t work. Shows
TypeError: object of type 'map' has no len()
However, it works if I use the following code,
df["new"] = df["calories"].map(add_cal)
But it does not follow for documented approach map(function, series)
Can someone please take some time to explain the correct way, and why is it so?
Convert the map output to list
df["new_value"] = list(map(alter_val, df["calories"]))
map
returns an iterator that yields results, not returns results, which means it’s results are not actually calculated until you explicitly "ask" for them. Try this:
list(map(alter_val, df["calories"]))
When you convert an iterator to a list, it has to calculate all of the results and store them in memory.
Despite that, I would stick to pandas .map()
method, as it appears to be cleaner in my opinion