Appending and overwriting certain columns of a pandas dataframe at the same time
Question:
it currently appears quite difficult to figure out an efficient proper solution (meaning: no loops) to a problem such as this:
Given a dataframe with this example structure:
T x c1 c2
1 11-12 3 'yes' 'yes'
2 12-12 4 'no' 'yes'
3 13-12 4 'no' 'yes'
4 14-12 4 'yes' 'yes'
5 15-12 2 'no' 'no'
6 16-12 4 'yes' 'yes'
If I wanted to add 5 new entries for T and x starting with the 4th interval and thus overwrite 3 intervals before actually appending data, how can this be done elegantly?
E.g., if I wanted to add:
T x
4 14-12 3
5 15-12 3
6 16-12 4
7 17-12 4
8 18-12 2
..it should turn out such as this:
T x c1 c2
1 11-12 3 'yes' 'yes'
2 12-12 4 'no' 'yes'
3 13-12 4 'no' 'yes'
4 14-12 3 'yes' 'yes'
5 15-12 3 'no' 'no'
6 16-12 4 'yes' 'yes'
7 17-12 4 nan nan
8 18-12 2 nan nan
Using a .loc to catch the right T section is not possible due to the needed indices not yet existing. I suppose I could split the supplementary data to do .update or .loc on the already existing part and then append the rest.
But that requires a couple of steps, so is there a better and easier option?
Answers:
I believe you are looking for combine_first()
df2.set_index('T').combine_first(df.set_index('T')).reset_index()
Output:
T c1 c2 x
0 11-12 'yes' 'yes' 3
1 12-12 'no' 'yes' 4
2 13-12 'no' 'yes' 4
3 14-12 'yes' 'yes' 3
4 15-12 'no' 'no' 3
5 16-12 'yes' 'yes' 4
6 17-12 NaN NaN 4
7 18-12 NaN NaN 2
it currently appears quite difficult to figure out an efficient proper solution (meaning: no loops) to a problem such as this:
Given a dataframe with this example structure:
T x c1 c2
1 11-12 3 'yes' 'yes'
2 12-12 4 'no' 'yes'
3 13-12 4 'no' 'yes'
4 14-12 4 'yes' 'yes'
5 15-12 2 'no' 'no'
6 16-12 4 'yes' 'yes'
If I wanted to add 5 new entries for T and x starting with the 4th interval and thus overwrite 3 intervals before actually appending data, how can this be done elegantly?
E.g., if I wanted to add:
T x
4 14-12 3
5 15-12 3
6 16-12 4
7 17-12 4
8 18-12 2
..it should turn out such as this:
T x c1 c2
1 11-12 3 'yes' 'yes'
2 12-12 4 'no' 'yes'
3 13-12 4 'no' 'yes'
4 14-12 3 'yes' 'yes'
5 15-12 3 'no' 'no'
6 16-12 4 'yes' 'yes'
7 17-12 4 nan nan
8 18-12 2 nan nan
Using a .loc to catch the right T section is not possible due to the needed indices not yet existing. I suppose I could split the supplementary data to do .update or .loc on the already existing part and then append the rest.
But that requires a couple of steps, so is there a better and easier option?
I believe you are looking for combine_first()
df2.set_index('T').combine_first(df.set_index('T')).reset_index()
Output:
T c1 c2 x
0 11-12 'yes' 'yes' 3
1 12-12 'no' 'yes' 4
2 13-12 'no' 'yes' 4
3 14-12 'yes' 'yes' 3
4 15-12 'no' 'no' 3
5 16-12 'yes' 'yes' 4
6 17-12 NaN NaN 4
7 18-12 NaN NaN 2