Is this the right way to move rows from one dataframe to another with a condtion?
Question:
I want to move some rows from df1
to df2
when calories in df1
and df2
are the same. The two dfs have the same columns.
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"calories": [420, 80, 90, 10],
"duration": [50, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"calories": [420, 380, 390],
"duration": [60, 40, 45]
})
print(df1)
print(df2)
calories duration
0 420 50
1 80 4
2 90 5
3 10 2
calories duration
0 420 60
1 380 40
2 390 45
rows = df1.loc[df1.calories == df2.calories, :]
df2 = df2.append(rows, ignore_index=True)
df1.drop(rows.index, inplace=True)
print('df1:')
print(df1)
print('df2:')
print(df2)
Then it reports this error:
raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
Answers:
Since your dataframes are not the same length, you need to use merge
to find rows with common calories
values. You need to merge on the index
and calories
values; that can most easily be achieved by using reset_index
to temporarily add an index
column to merge on:
dftemp = df1.reset_index().merge(df2.reset_index(), on=['index', 'calories'], suffixes=['', '_y'])
Output:
index calories duration duration_y
0 0 420 50 60
You can now concat
the calories
and duration
values from dftemp
to df2
(using reset_index
again to reset the index):
df2 = pd.concat([df2, dftemp[['calories', 'duration']]]).reset_index(drop=True)
Output (for your sample data):
calories duration
0 420 60
1 380 40
2 390 45
3 420 50
To remove the rows that were copied to df2
from df1
, we merge just on index, then filter out rows where the two calories
values are different:
dftemp = df1.merge(df2, left_index=True, right_index=True, suffixes=['', '_y']).query('calories != calories_y')
df1 = dftemp[['calories', 'duration']].reset_index(drop=True)
Output (for your sample data):
calories duration
0 80 4
1 90 5
2 10 3
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"mid": [420, 380, 90, 420],
"A": [50, 4, 5, 3],
"B": [420, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"mid": [420, 380, 390],
"A": [60, 40, 80],
"B": [150, 24, 25]
})
print('df1:')
print(df1)
print('df2:')
print(df2)
new_df1 = df1[~df1.mid.isin(df2.mid)]
dup_df1 = df1[df1.mid.isin(df2.mid)]
new_df2 = df2.append(dup_df1, ignore_index=True)
print('dup:')
print(dup_df1)
print('df1:')
print(new_df1)
print('df2:')
print(new_df2)
This answer was posted as an edit to the question Is this the right way to move rows from one dataframe to another with a condtion? by the OP marlon under CC BY-SA 4.0.
I want to move some rows from df1
to df2
when calories in df1
and df2
are the same. The two dfs have the same columns.
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"calories": [420, 80, 90, 10],
"duration": [50, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"calories": [420, 380, 390],
"duration": [60, 40, 45]
})
print(df1)
print(df2)
calories duration
0 420 50
1 80 4
2 90 5
3 10 2
calories duration
0 420 60
1 380 40
2 390 45
rows = df1.loc[df1.calories == df2.calories, :]
df2 = df2.append(rows, ignore_index=True)
df1.drop(rows.index, inplace=True)
print('df1:')
print(df1)
print('df2:')
print(df2)
Then it reports this error:
raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
Since your dataframes are not the same length, you need to use merge
to find rows with common calories
values. You need to merge on the index
and calories
values; that can most easily be achieved by using reset_index
to temporarily add an index
column to merge on:
dftemp = df1.reset_index().merge(df2.reset_index(), on=['index', 'calories'], suffixes=['', '_y'])
Output:
index calories duration duration_y
0 0 420 50 60
You can now concat
the calories
and duration
values from dftemp
to df2
(using reset_index
again to reset the index):
df2 = pd.concat([df2, dftemp[['calories', 'duration']]]).reset_index(drop=True)
Output (for your sample data):
calories duration
0 420 60
1 380 40
2 390 45
3 420 50
To remove the rows that were copied to df2
from df1
, we merge just on index, then filter out rows where the two calories
values are different:
dftemp = df1.merge(df2, left_index=True, right_index=True, suffixes=['', '_y']).query('calories != calories_y')
df1 = dftemp[['calories', 'duration']].reset_index(drop=True)
Output (for your sample data):
calories duration
0 80 4
1 90 5
2 10 3
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"mid": [420, 380, 90, 420],
"A": [50, 4, 5, 3],
"B": [420, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"mid": [420, 380, 390],
"A": [60, 40, 80],
"B": [150, 24, 25]
})
print('df1:')
print(df1)
print('df2:')
print(df2)
new_df1 = df1[~df1.mid.isin(df2.mid)]
dup_df1 = df1[df1.mid.isin(df2.mid)]
new_df2 = df2.append(dup_df1, ignore_index=True)
print('dup:')
print(dup_df1)
print('df1:')
print(new_df1)
print('df2:')
print(new_df2)
This answer was posted as an edit to the question Is this the right way to move rows from one dataframe to another with a condtion? by the OP marlon under CC BY-SA 4.0.