Map dataframe function without lambda

Question:

I have the following function:

def summarize(text, percentage=.6):
    import numpy as np
    sentences = nltk.sent_tokenize(text)
    sentences = sentences[:int(percentage*len(sentences))]
    summary = ''.join([str(sentence) for sentence in sentences])
    return summary

And I want to map it to dataframe rows. It works pretty well when I use the following code :

df['summary'] = df['text'].map(summarize)

However, when I want to change the percentage variable in this call, it does df['summary'] = df['text'].map(summarize(percentage=.8)), it shows an error indicating it requires another argument, which is text. Of course, it can be resolved using a lambda function as follows:

df['summary'] = df['text'].map(lambda x: summarize(x, percentage=.8))

But I do not want use the lambda in the call. Is there any method to do it otherwise? For example using kwargs inside the function to refer to the text column in the dataframe? Thank you

Asked By: Mus

||

Answers:

Possible solution is use Series.apply instead map, then is possible add parameters without lambda like named arguments:

df['summary'] = df['text'].map(summarize, percentage=.8)

TypeError: map() got an unexpected keyword argument ‘percentage’


df['summary'] = df['text'].apply(summarize, percentage=.8)
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.