How can I use n and t in lambda function to replace dataframe?

Question

I have very large scale dataframe with 100,000 rows and 300 columns

and I’m trying to fill out Nan rows in one columns by extracting the values from the other columns

here is the example,

let’s say we have a sample dataframe such as:

     NAME    RRN_FRONT    RRN_BACK    EVENT_DTL
1    JOHN    891105       1067714     Nan
2    SHOWN   791134       1156543     Nan
3    BROWN   581104       1668314     Nan
4    MIKE    984564       0153422     1. Name : MIKE 
                                      2. BIRTHDAY : 984564 
                                      3. SSN : 0153422
5    LARRY   796515       0168165     1. Name : LARRY 
                                      2. BIRTHDAY : 796515 
                                      3. SSN : 0168165

and I want to fill out Nan values with the NAME, RRN_FRONT, RRN_BACK

Here is the input that I tried:

df.loc[df.EVENT_DTL.isnull(), 'EVENT_DTL'] = df.apply(lambda x: ('1. NAME : ' + str(x['NAME']) + 'n2. BIRTHDAY : ' + str(x['RRN_FRONT']) + 'n3. SSN : ' + str(x['RRN_BACK']),axis=1)

and the output is(which is not what I intended):

1. NAME : JOHN2. nBIRTHDAY : 8911053. nSSN : 1067714
2. ...
 .
 .
5. ...

Here is the desired output of df['EVENT_DTL']:

1 1. NAME : JOHN
  2. BIRTHDAY : 8911053
  3. SSN : 1156543
2 1. NAME : SHOWN
  2. BIRTHDAY : 791134
  3. SSN : 1156543
3 ‥
4 ‥
5 ‥

Asked By: Chung Joshua

||

Source

Answer 1

Pandas.apply applies the operations on axis=0 (index axis) by default, and you need to change the axis=1 in your case:

df['EVENT_DTL'] = (np.where(df['EVENT_DTL'].isna(), 
                  df.apply(lambda x: ('1. NAME :n' + str(x['NAME']) +
                  '2. BIRTHDAY :n' + str(x['RRN_FRONT']) + '3. SSN : n' + 
                  str(x['RRN_BACK'])), axis=1),
                  df['EVENT_DTL']))

Output:

0    1. NAME :nJOHN2. BIRTHDAY :n8911053. SSN : ...
1    1. NAME :nSHOWN2. BIRTHDAY :n7911343. SSN : ...
2    1. NAME :nBROWN2. BIRTHDAY :n5811043. SSN : ...
3    1. Name : MIKE 2. BIRTHDAY : 984564 3. SSN : 0...
4    1. Name : LARRY 2. BIRTHDAY : 796515 3. SSN : ...
Name: EVENT_DTL, dtype: object

Answered By: Nuri Taş

Answer 2

Solution without apply:

df = pd.DataFrame({'col1': ['JOHN', 'SHOWN', 'BROWN'], 'col2': [10, 20, 30], 'col3': [None, None, 'other text']})
idx = df.col3.isna()
df.loc[idx, 'col3'] = ('1. Name:' + df.loc[idx, 'col1'] + 'n2. BIRTHDAY:' + df.loc[idx, 'col2'].astype('str')).str.split('n')
df = df.explode('col3')
df = df.set_index([df.index+1, df.groupby(level=0).cumcount()+1])['col3']
print(df)

1  1      1. Name:JOHN
   2    2. BIRTHDAY:10
2  1     1. Name:SHOWN
   2    2. BIRTHDAY:20
3  1        other text
Name: col3, dtype: object

Answered By: Алексей Р

How can I use n and t in lambda function to replace dataframe?

Question:

Answers: