How can I use n and t in lambda function to replace dataframe?
Question:
I have very large scale dataframe with 100,000 rows and 300 columns
and I’m trying to fill out Nan rows in one columns by extracting the values from the other columns
here is the example,
let’s say we have a sample dataframe such as:
NAME RRN_FRONT RRN_BACK EVENT_DTL
1 JOHN 891105 1067714 Nan
2 SHOWN 791134 1156543 Nan
3 BROWN 581104 1668314 Nan
4 MIKE 984564 0153422 1. Name : MIKE
2. BIRTHDAY : 984564
3. SSN : 0153422
5 LARRY 796515 0168165 1. Name : LARRY
2. BIRTHDAY : 796515
3. SSN : 0168165
and I want to fill out Nan values with the NAME, RRN_FRONT, RRN_BACK
Here is the input that I tried:
df.loc[df.EVENT_DTL.isnull(), 'EVENT_DTL'] = df.apply(lambda x: ('1. NAME : ' + str(x['NAME']) + 'n2. BIRTHDAY : ' + str(x['RRN_FRONT']) + 'n3. SSN : ' + str(x['RRN_BACK']),axis=1)
and the output is(which is not what I intended):
1. NAME : JOHN2. nBIRTHDAY : 8911053. nSSN : 1067714
2. ...
.
.
5. ...
Here is the desired output of df['EVENT_DTL']
:
1 1. NAME : JOHN
2. BIRTHDAY : 8911053
3. SSN : 1156543
2 1. NAME : SHOWN
2. BIRTHDAY : 791134
3. SSN : 1156543
3 ‥
4 ‥
5 ‥
Answers:
Pandas.apply applies the operations on axis=0
(index axis) by default, and you need to change the axis=1
in your case:
df['EVENT_DTL'] = (np.where(df['EVENT_DTL'].isna(),
df.apply(lambda x: ('1. NAME :n' + str(x['NAME']) +
'2. BIRTHDAY :n' + str(x['RRN_FRONT']) + '3. SSN : n' +
str(x['RRN_BACK'])), axis=1),
df['EVENT_DTL']))
Output:
0 1. NAME :nJOHN2. BIRTHDAY :n8911053. SSN : ...
1 1. NAME :nSHOWN2. BIRTHDAY :n7911343. SSN : ...
2 1. NAME :nBROWN2. BIRTHDAY :n5811043. SSN : ...
3 1. Name : MIKE 2. BIRTHDAY : 984564 3. SSN : 0...
4 1. Name : LARRY 2. BIRTHDAY : 796515 3. SSN : ...
Name: EVENT_DTL, dtype: object
Solution without apply:
df = pd.DataFrame({'col1': ['JOHN', 'SHOWN', 'BROWN'], 'col2': [10, 20, 30], 'col3': [None, None, 'other text']})
idx = df.col3.isna()
df.loc[idx, 'col3'] = ('1. Name:' + df.loc[idx, 'col1'] + 'n2. BIRTHDAY:' + df.loc[idx, 'col2'].astype('str')).str.split('n')
df = df.explode('col3')
df = df.set_index([df.index+1, df.groupby(level=0).cumcount()+1])['col3']
print(df)
1 1 1. Name:JOHN
2 2. BIRTHDAY:10
2 1 1. Name:SHOWN
2 2. BIRTHDAY:20
3 1 other text
Name: col3, dtype: object
I have very large scale dataframe with 100,000 rows and 300 columns
and I’m trying to fill out Nan rows in one columns by extracting the values from the other columns
here is the example,
let’s say we have a sample dataframe such as:
NAME RRN_FRONT RRN_BACK EVENT_DTL
1 JOHN 891105 1067714 Nan
2 SHOWN 791134 1156543 Nan
3 BROWN 581104 1668314 Nan
4 MIKE 984564 0153422 1. Name : MIKE
2. BIRTHDAY : 984564
3. SSN : 0153422
5 LARRY 796515 0168165 1. Name : LARRY
2. BIRTHDAY : 796515
3. SSN : 0168165
and I want to fill out Nan values with the NAME, RRN_FRONT, RRN_BACK
Here is the input that I tried:
df.loc[df.EVENT_DTL.isnull(), 'EVENT_DTL'] = df.apply(lambda x: ('1. NAME : ' + str(x['NAME']) + 'n2. BIRTHDAY : ' + str(x['RRN_FRONT']) + 'n3. SSN : ' + str(x['RRN_BACK']),axis=1)
and the output is(which is not what I intended):
1. NAME : JOHN2. nBIRTHDAY : 8911053. nSSN : 1067714
2. ...
.
.
5. ...
Here is the desired output of df['EVENT_DTL']
:
1 1. NAME : JOHN
2. BIRTHDAY : 8911053
3. SSN : 1156543
2 1. NAME : SHOWN
2. BIRTHDAY : 791134
3. SSN : 1156543
3 ‥
4 ‥
5 ‥
Pandas.apply applies the operations on axis=0
(index axis) by default, and you need to change the axis=1
in your case:
df['EVENT_DTL'] = (np.where(df['EVENT_DTL'].isna(),
df.apply(lambda x: ('1. NAME :n' + str(x['NAME']) +
'2. BIRTHDAY :n' + str(x['RRN_FRONT']) + '3. SSN : n' +
str(x['RRN_BACK'])), axis=1),
df['EVENT_DTL']))
Output:
0 1. NAME :nJOHN2. BIRTHDAY :n8911053. SSN : ...
1 1. NAME :nSHOWN2. BIRTHDAY :n7911343. SSN : ...
2 1. NAME :nBROWN2. BIRTHDAY :n5811043. SSN : ...
3 1. Name : MIKE 2. BIRTHDAY : 984564 3. SSN : 0...
4 1. Name : LARRY 2. BIRTHDAY : 796515 3. SSN : ...
Name: EVENT_DTL, dtype: object
Solution without apply:
df = pd.DataFrame({'col1': ['JOHN', 'SHOWN', 'BROWN'], 'col2': [10, 20, 30], 'col3': [None, None, 'other text']})
idx = df.col3.isna()
df.loc[idx, 'col3'] = ('1. Name:' + df.loc[idx, 'col1'] + 'n2. BIRTHDAY:' + df.loc[idx, 'col2'].astype('str')).str.split('n')
df = df.explode('col3')
df = df.set_index([df.index+1, df.groupby(level=0).cumcount()+1])['col3']
print(df)
1 1 1. Name:JOHN
2 2. BIRTHDAY:10
2 1 1. Name:SHOWN
2 2. BIRTHDAY:20
3 1 other text
Name: col3, dtype: object