python – Joining two columns pandas – returning NA if any value is NA, however need to return real join
Question:
I have dataframe:
df = pd.DataFrame({'student_id': [71, 63, 23],
'student_name': [nan, 'Peter Andrews', 'Amy Powers'],
})
I am creating new column column which joins id + name using
df['student_id_name'] = df['student_id'].astype(str) + ' ' + df['student_name']
Needed output:
{student_id_name : [71, 63 Peter Andrews, 23 Amy Powers]}
What I get:
{student_id_name : [nan, 63 Peter Andrews, 23 Amy Powers]}
May you help to get to expected outcome?
Answers:
Use Series.str.cat
with na_rep
parameter, last remove possible trailing spaces by Series.str.strip
:
df['student_id_name'] = (df['student_id'].astype(str).str.cat(df['student_name'],
sep=' ', na_rep='').str.strip())
print (df)
student_id student_name student_id_name
0 71 NaN 71
1 63 Peter Andrews 63 Peter Andrews
2 23 Amy Powers 23 Amy Powers
You can use fillna()
to cleanup missing/blank values in dataframe. Then your original expression will work. Note that this will actually replace nan with replace value used:
import math
df = pd.DataFrame({'student_id': [71, 63, 23],
'student_name': [math.nan, 'Peter Andrews', 'Amy Powers'],
})
#
df = df.fillna('')
df['student_id_name'] = df['student_id'].astype(str) + ' ' + df['student_name']
[Out]:
student_id student_name student_id_name
0 71 71
1 63 Peter Andrews 63 Peter Andrews
2 23 Amy Powers 23 Amy Powers
I have dataframe:
df = pd.DataFrame({'student_id': [71, 63, 23],
'student_name': [nan, 'Peter Andrews', 'Amy Powers'],
})
I am creating new column column which joins id + name using
df['student_id_name'] = df['student_id'].astype(str) + ' ' + df['student_name']
Needed output:
{student_id_name : [71, 63 Peter Andrews, 23 Amy Powers]}
What I get:
{student_id_name : [nan, 63 Peter Andrews, 23 Amy Powers]}
May you help to get to expected outcome?
Use Series.str.cat
with na_rep
parameter, last remove possible trailing spaces by Series.str.strip
:
df['student_id_name'] = (df['student_id'].astype(str).str.cat(df['student_name'],
sep=' ', na_rep='').str.strip())
print (df)
student_id student_name student_id_name
0 71 NaN 71
1 63 Peter Andrews 63 Peter Andrews
2 23 Amy Powers 23 Amy Powers
You can use fillna()
to cleanup missing/blank values in dataframe. Then your original expression will work. Note that this will actually replace nan with replace value used:
import math
df = pd.DataFrame({'student_id': [71, 63, 23],
'student_name': [math.nan, 'Peter Andrews', 'Amy Powers'],
})
#
df = df.fillna('')
df['student_id_name'] = df['student_id'].astype(str) + ' ' + df['student_name']
[Out]:
student_id student_name student_id_name
0 71 71
1 63 Peter Andrews 63 Peter Andrews
2 23 Amy Powers 23 Amy Powers