How can I use n and t in lambda function to replace dataframe?

Question:

I have very large scale dataframe with 100,000 rows and 300 columns

and I’m trying to fill out Nan rows in one columns by extracting the values from the other columns

here is the example,

let’s say we have a sample dataframe such as:

     NAME    RRN_FRONT    RRN_BACK    EVENT_DTL
1    JOHN    891105       1067714     Nan
2    SHOWN   791134       1156543     Nan
3    BROWN   581104       1668314     Nan
4    MIKE    984564       0153422     1. Name : MIKE 
                                      2. BIRTHDAY : 984564 
                                      3. SSN : 0153422
5    LARRY   796515       0168165     1. Name : LARRY 
                                      2. BIRTHDAY : 796515 
                                      3. SSN : 0168165    

and I want to fill out Nan values with the NAME, RRN_FRONT, RRN_BACK

Here is the input that I tried:

df.loc[df.EVENT_DTL.isnull(), 'EVENT_DTL'] = df.apply(lambda x: ('1. NAME : ' + str(x['NAME']) + 'n2. BIRTHDAY : ' + str(x['RRN_FRONT']) + 'n3. SSN : ' + str(x['RRN_BACK']),axis=1)

and the output is(which is not what I intended):

1. NAME : JOHN2. nBIRTHDAY : 8911053. nSSN : 1067714
2. ...
 .
 .
5. ...

Here is the desired output of df['EVENT_DTL']:

1 1. NAME : JOHN
  2. BIRTHDAY : 8911053
  3. SSN : 1156543
2 1. NAME : SHOWN
  2. BIRTHDAY : 791134
  3. SSN : 1156543
3 ‥
4 ‥
5 ‥
Asked By: Chung Joshua

||

Answers:

Pandas.apply applies the operations on axis=0 (index axis) by default, and you need to change the axis=1 in your case:

df['EVENT_DTL'] = (np.where(df['EVENT_DTL'].isna(), 
                  df.apply(lambda x: ('1. NAME :n' + str(x['NAME']) +
                  '2. BIRTHDAY :n' + str(x['RRN_FRONT']) + '3. SSN : n' + 
                  str(x['RRN_BACK'])), axis=1),
                  df['EVENT_DTL']))

Output:

0    1. NAME :nJOHN2. BIRTHDAY :n8911053. SSN : ...
1    1. NAME :nSHOWN2. BIRTHDAY :n7911343. SSN : ...
2    1. NAME :nBROWN2. BIRTHDAY :n5811043. SSN : ...
3    1. Name : MIKE 2. BIRTHDAY : 984564 3. SSN : 0...
4    1. Name : LARRY 2. BIRTHDAY : 796515 3. SSN : ...
Name: EVENT_DTL, dtype: object
Answered By: Nuri Taş

Solution without apply:

df = pd.DataFrame({'col1': ['JOHN', 'SHOWN', 'BROWN'], 'col2': [10, 20, 30], 'col3': [None, None, 'other text']})
idx = df.col3.isna()
df.loc[idx, 'col3'] = ('1. Name:' + df.loc[idx, 'col1'] + 'n2. BIRTHDAY:' + df.loc[idx, 'col2'].astype('str')).str.split('n')
df = df.explode('col3')
df = df.set_index([df.index+1, df.groupby(level=0).cumcount()+1])['col3']
print(df)
1  1      1. Name:JOHN
   2    2. BIRTHDAY:10
2  1     1. Name:SHOWN
   2    2. BIRTHDAY:20
3  1        other text
Name: col3, dtype: object
Answered By: Алексей Р
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.