How to merge dataframe and series?
Question:
I have a dataframe
and a series
which are as following.
_df = pd.DataFrame({'a': [1, '', '', 4], 'b': ['apple', 'banana', 'orange', 'pear']}, columns=['a', 'b'])
_series = pd.Series(['', 2, 3, ''], name="a")
Here I would like to merge the dataframe
and series
along column a
to get rid of all the blanks. This is the result I want.
a b
0 1 apple
1 2 banana
2 3 orange
3 4 pear
Here is how I do it.
for i in range(len(_df.iloc[:, 0].to_list())):
if _df.iloc[i, 0] == '':
_df.iloc[i, 0] = df_series[i]
Problem is it can be very slow if the dataframe
is big. Anyone knows if I can do this in a more efficient way?
Answers:
Use Series.combine
with lambda function for test not empty strings:
df = _df.assign(a=_df["a"].combine(_series, lambda x, y: x if x != '' else y))
print (df)
a b
0 1 apple
1 2 banana
2 3 orange
3 4 pear
Or Series.combine_first
with replace empty strings to missing values:
df = _df.assign(a=_df["a"].replace('', np.nan).combine_first(_series.replace('', np.nan)))
print (df)
a b
0 1.0 apple
1 2.0 banana
2 3.0 orange
3 4.0 pear
For new column name c
from variable a
use:
a = 'c'
df = _df.assign(**{f'{a}': _df["a"].combine(_series, lambda x, y: x if x != '' else y)})
print (df)
Or:
a = 'c'
_df[f'{a}'] = _df["a"].combine(_series, lambda x, y: x if x != '' else y)
print (_df)
a b c
0 1 apple 1
1 banana 2
2 orange 3
3 4 pear 4
I have a dataframe
and a series
which are as following.
_df = pd.DataFrame({'a': [1, '', '', 4], 'b': ['apple', 'banana', 'orange', 'pear']}, columns=['a', 'b'])
_series = pd.Series(['', 2, 3, ''], name="a")
Here I would like to merge the dataframe
and series
along column a
to get rid of all the blanks. This is the result I want.
a b
0 1 apple
1 2 banana
2 3 orange
3 4 pear
Here is how I do it.
for i in range(len(_df.iloc[:, 0].to_list())):
if _df.iloc[i, 0] == '':
_df.iloc[i, 0] = df_series[i]
Problem is it can be very slow if the dataframe
is big. Anyone knows if I can do this in a more efficient way?
Use Series.combine
with lambda function for test not empty strings:
df = _df.assign(a=_df["a"].combine(_series, lambda x, y: x if x != '' else y))
print (df)
a b
0 1 apple
1 2 banana
2 3 orange
3 4 pear
Or Series.combine_first
with replace empty strings to missing values:
df = _df.assign(a=_df["a"].replace('', np.nan).combine_first(_series.replace('', np.nan)))
print (df)
a b
0 1.0 apple
1 2.0 banana
2 3.0 orange
3 4.0 pear
For new column name c
from variable a
use:
a = 'c'
df = _df.assign(**{f'{a}': _df["a"].combine(_series, lambda x, y: x if x != '' else y)})
print (df)
Or:
a = 'c'
_df[f'{a}'] = _df["a"].combine(_series, lambda x, y: x if x != '' else y)
print (_df)
a b c
0 1 apple 1
1 banana 2
2 orange 3
3 4 pear 4