How do I properly call a function and return an updated dataframe?

Question:

I am trying to process and update rows in a dataframe through a function, and return the dataframe to finish using it. When I try to return the dataframe to the original function call, it returns a series and not the expected column updates. A simple example is below:

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =
['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])

def get_item(data):
    comb=pd.DataFrame()
    comb['Newfield'] = data     #create new columns
    comb['AnotherNewfield'] = 'y'

return pd.DataFrame(comb)

Caling a function using apply:

>>> newdf = df['A'].apply(get_item)

>>> newdf
a          A   Newfield AnotherNewfield
a  adam  st...
b          A   Newfield AnotherNewfield
e   sed  st...
c          A   Newfield AnotherNewfield
d  dave  st...
d          A   Newfield AnotherNewfield
d  dave  st...
e          A   Newfield AnotherNewfield
s   NaN  st...
f         A   Newfield AnotherNewfield
m  NaN  str(...
Name: A, dtype: object
>>> type(newdf)
<class 'pandas.core.series.Series'>

I assume that apply() is bad here, but am not quite sure how I ‘should’ be updating this dataframe via function otherwise.

Edit: I appologize but i seems I accidentally deleted the sample function on an edit. added it back here as I attempt a few other things I found in other posts.

Testing in a slightly different manner with individual variables – and returning multiple series variables -> seems to work so I will see if this is something I can do in my actual case and update.

def get_item(data):

    value = data     #create new columns
    AnotherNewfield = 'y'
    return pd.Series(value),pd.Series(AnotherNewfield)
df['B'], df['C'] = zip(*df['A'].apply(get_item))
Asked By: Sedric Hibler

||

Answers:

For anyone looking for a potential answer to this, I got the desired result when executing this code I found in another post. Will post that guy’s name to credit him, but this essentially allowed me to edit the function and get the data that was created in the different columns via the apply function:

def get_item(data):
    
    value = data     #create new columns using variables
    AnotherNewfield = 'y'
    return pd.Series(value),pd.Series(AnotherNewfield)

>>> df['B'], df['C'] = zip(*df['A'].apply(get_item))
>>> df
      A        B     C
a  adam  (adam,)  (y,)
b    ed    (ed,)  (y,)
c   dra   (dra,)  (y,)
d  dave  (dave,)  (y,)
e   sed   (sed,)  (y,)
f  mike  (mike,)  (y,)
>>>

The only problem it brings is – the parenthesis and comma come with the data. I intend to get rid of that in the code outside of the function. Perhaps this

>>> df['B'] = df['B'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
      A       B     C
a  adam   adam   (y,)
b    ed     ed   (y,)
c   dra    dra   (y,)
d  dave   dave   (y,)
e   sed    sed   (y,)
f  mike   mike   (y,)
>>> df['C'] = df['C'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
      A       B    C
a  adam   adam    y 
b    ed     ed    y 
c   dra    dra    y 
d  dave   dave    y 
e   sed    sed    y 
f  mike   mike    y 
Answered By: Sedric Hibler

You could use groupby with apply to get dataframe from apply call, like this:

import pandas as pd

# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
    {'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
    index=['a', 'b', 'c', 'd', 'e', 'f'])

def get_item(data):
    # create empty dataframe to be returned
    comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
    # append series data (or any data) to dataframe's columns 
    comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
    comb['AnotherNewfield'] = 'y'
    # return complete dataframe
    return comb

# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)

Output:

    Newfield    AnotherNewfield
0   adam        y
1   ed          y
2   dra         y
3   dave        y
4   sed         y
5   mike        y
Answered By: Lukas

This will work:

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])
def get_item(data):
    comb=pd.DataFrame()
    comb['Newfield'] = data     #create new columns
    comb['AnotherNewfield'] = 'y'
    return comb
new_df = get_item(df)
Answered By: Brad667
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.