pandas combine two strings ignore nan values

Question:

I have two columns with strings. I would like to combine them and ignore nan values. Such that:

ColA, Colb, ColA+ColB
str   str    strstr
str   nan    str
nan   str    str

I tried df['ColA+ColB'] = df['ColA'] + df['ColB'] but that creates a nan value if either column is nan. I’ve also thought about using concat.

I suppose I could just go with that, and then use some df.ColA+ColB[df[ColA] = nan] = df[ColA] but that seems like quite the workaround.

Asked By: As3adTintin

||

Answers:

You could fill the NaN with an empty string:

df['ColA+ColB'] = df['ColA'].fillna('') + df['ColB'].fillna('')
Answered By: AChampion

Call fillna and pass an empty str as the fill value and then sum with param axis=1:

In [3]:
df = pd.DataFrame({'a':['asd',np.NaN,'asdsa'], 'b':['asdas','asdas',np.NaN]})
df

Out[3]:
       a      b
0    asd  asdas
1    NaN  asdas
2  asdsa    NaN

In [7]:
df['a+b'] = df.fillna('').sum(axis=1)
df

Out[7]:
       a      b       a+b
0    asd  asdas  asdasdas
1    NaN  asdas     asdas
2  asdsa    NaN     asdsa
Answered By: EdChum

Using apply and str.cat you can

In [723]: df
Out[723]:
       a      b
0    asd  asdas
1    NaN  asdas
2  asdsa    NaN

In [724]: df['a+b'] = df.apply(lambda x: x.str.cat(sep=''), axis=1)

In [725]: df
Out[725]:
       a      b       a+b
0    asd  asdas  asdasdas
1    NaN  asdas     asdas
2  asdsa    NaN     asdsa
Answered By: Zero

Prefer adding the columns than use apply method. cuz it’s faster than apply.

  • Just add the two columns (if you know they are strings)

    %timeit df.bio + df.procedure_codes  
    

    21.2 ms ± 1.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

  • Use apply

    %timeit df[eventcol].apply(lambda x: ''.join(x), axis=1)  
    

    13.6 s ± 343 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

  • Use Pandas string methods and cat:

    %timeit df[eventcol[0]].str.cat(cols, sep=',')  
    

    264 ms ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

  • Using sum (which concatenate strings)

    %timeit df[eventcol].sum(axis=1)  
    

    509 ms ± 6.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

see here for more tests

Answered By: Kevin Chou

In my case, I wanted to join more than 2 columns together with a separator (a+b+c)

In [3]:
df = pd.DataFrame({'a':['asd',np.NaN,'asdsa'], 'b':['asdas','asdas',np.NaN], 'c':['as',np.NaN ,'ds']})

In [4]: df
Out[4]:
       a      b   c
0    asd  asdas   as
1    NaN  asdas   NaN
2  asdsa    NaN   ds

The following syntax worked for me:

In [5]: df['d'] = df[['a', 'b', 'c']].fillna('').agg('|'.join, axis=1)

In [6]: df

Out[6]:
      a      b    c             d
0    asd  asdas   as  asd|asdas|as
1    NaN  asdas  NaN       |asdas|
2  asdsa    NaN   ds     asdsa||ds
Answered By: Vaulstein
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.