How to combine values of multiple rows in panda
Question:
I have dataframe file that split text into multiple rows, like:
A
B
aaa
bbbb
ccccc
NaN
NaN
NaN
dddd
ffff
eeee
NaN
gg
NaN
I hope to merge the value of each row to its next rows unless it is blank and get a data frame like:
A
B
aaacccc
bbbb
ddddeeeegg
ffff
Is there an efficient way to convert the dataframe in python?
Answers:
You can create a mask and group from the rows with all NaNs, then GroupBy.agg
to join
the strings:
# rows with all NaN?
mask = df.isna().all(axis=1)
# create group starting with all-NaN rows
group = mask.cumsum()
# filter, group, aggregate
out = df[~mask].groupby(group).agg(lambda x: ''.join(x.dropna()))
output:
A B
0 aaaccccc bbbb
1 ddddeeeegg ffff
I have dataframe file that split text into multiple rows, like:
A | B |
---|---|
aaa | bbbb |
ccccc | NaN |
NaN | NaN |
dddd | ffff |
eeee | NaN |
gg | NaN |
I hope to merge the value of each row to its next rows unless it is blank and get a data frame like:
A | B |
---|---|
aaacccc | bbbb |
ddddeeeegg | ffff |
Is there an efficient way to convert the dataframe in python?
You can create a mask and group from the rows with all NaNs, then GroupBy.agg
to join
the strings:
# rows with all NaN?
mask = df.isna().all(axis=1)
# create group starting with all-NaN rows
group = mask.cumsum()
# filter, group, aggregate
out = df[~mask].groupby(group).agg(lambda x: ''.join(x.dropna()))
output:
A B
0 aaaccccc bbbb
1 ddddeeeegg ffff