How to use the same variable again in the pandas assign() method?

Question:

I have started learning Python Pandas. So basically I am an R user and heavily use tidyverse. So I am trying to use Pandas in the same manner as the Tidyverse. So I am trying to execute this code which throws me an error.

(
    pd.DataFrame(
        {'A':[1,2,3],
         'B':[4,5,6]}
                )
    .assign(A = lambda x: x.A + 1,
            B = lambda x: x.B + x.A,
            A = 1)
    
)

SyntaxError: keyword argument repeated: A

So how could I use pandas in a tidyverse manner? More specifically is there any method in pandas that works like the dplyr::mutate?

Asked By: Pritom Roy

||

Answers:

Try not pass the re-assign A value

pd.DataFrame(
    ...:     {'A': [1, 2, 3],
    ...:      'B': [4, 5, 6]}
    ...: ).assign(
    ...:         B = lambda x: x.B + x.A + 1,
    ...:         A = 1
    ...:         )
    ...: 
Out[154]: 
   A   B
0  1   6
1  1   8
2  1  10

In R dplyr and tidyverse assign value with mutate two times is not necessary~

When you do groupby in pandas,

transform is almost equal to group_by mutate in R

agg is almost equal to group_by summarise in R

Answered By: BENY

One (maybe obvious) approach, could be to use several assign:

(pd.DataFrame({'A':[1,2,3],
               'B':[4,5,6]})
   .assign(A = lambda x: x.A + 1,
           B = lambda x: x.B + x.A,)
   .assign(A = 1)
)

Another could be to use pipe and a function:

def process(df):
    df['A'] = df['A']+1
    df['B'] = df['A']+df['B']+1
    df['A'] = 1
    return df
    
(pd.DataFrame({'A':[1,2,3],
               'B':[4,5,6]})
   .pipe(process)
)

output:

   A   B
0  1   6
1  1   8
2  1  10
Answered By: mozway