Is this the right way to move rows from one dataframe to another with a condtion?

Question:

I want to move some rows from df1 to df2 when calories in df1 and df2 are the same. The two dfs have the same columns.

import numpy as np
import pandas as pd

np.random.seed(0)
df1 = pd.DataFrame(data = {
  "calories": [420, 80, 90, 10],
  "duration": [50, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
  "calories": [420, 380, 390],
  "duration": [60, 40, 45]
})

print(df1)
print(df2)



calories  duration
0       420        50
1        80         4
2        90         5
3        10         2
   calories  duration
0       420        60
1       380        40
2       390        45

rows = df1.loc[df1.calories == df2.calories, :]
df2 = df2.append(rows, ignore_index=True)
df1.drop(rows.index, inplace=True)

print('df1:')
print(df1)
print('df2:')
print(df2)

Then it reports this error:

raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
Asked By: marlon

||

Answers:

Since your dataframes are not the same length, you need to use merge to find rows with common calories values. You need to merge on the index and calories values; that can most easily be achieved by using reset_index to temporarily add an index column to merge on:

dftemp = df1.reset_index().merge(df2.reset_index(), on=['index', 'calories'], suffixes=['', '_y'])

Output:

   index  calories  duration  duration_y
0      0       420        50          60

You can now concat the calories and duration values from dftemp to df2 (using reset_index again to reset the index):

df2 = pd.concat([df2, dftemp[['calories', 'duration']]]).reset_index(drop=True)

Output (for your sample data):

   calories  duration
0       420        60
1       380        40
2       390        45
3       420        50

To remove the rows that were copied to df2 from df1, we merge just on index, then filter out rows where the two calories values are different:

dftemp = df1.merge(df2, left_index=True, right_index=True, suffixes=['', '_y']).query('calories != calories_y')
df1 = dftemp[['calories', 'duration']].reset_index(drop=True)

Output (for your sample data):

   calories  duration
0        80         4
1        90         5
2        10         3
Answered By: Nick
import numpy as np
import pandas as pd

np.random.seed(0)
df1 = pd.DataFrame(data = {
  "mid": [420, 380, 90, 420],
  "A": [50, 4, 5, 3],
   "B": [420, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
  "mid": [420, 380, 390],
  "A": [60, 40, 80],
   "B": [150, 24, 25]
})

print('df1:')
print(df1)
print('df2:')
print(df2)
new_df1 = df1[~df1.mid.isin(df2.mid)]

dup_df1 = df1[df1.mid.isin(df2.mid)]
new_df2 = df2.append(dup_df1, ignore_index=True)

print('dup:')
print(dup_df1)
print('df1:')
print(new_df1)
print('df2:')
print(new_df2)

This answer was posted as an edit to the question Is this the right way to move rows from one dataframe to another with a condtion? by the OP marlon under CC BY-SA 4.0.

Answered By: vvvvv
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.