Python Pandas Only Compare Identically Labeled DataFrame Objects

Question:

I tried all the solutions here:
Pandas "Can only compare identically-labeled DataFrame objects" error

Didn’t work for me. Here’s what I’ve got. I have two data frames. One is a set of financial data that already exists in the system and another set that has some that may or may not exist in the system. I need to find the difference and add the stuff that doesn’t exist.

Here is the code:

import pandas as pd
import numpy as np
from azure.storage.blob import AppendBlobService, PublicAccess, ContentSettings
from io import StringIO

dataUrl = "http://ichart.finance.yahoo.com/table.csv?s=MSFT"
blobUrlBase = "https://pyjobs.blob.core.windows.net/"
data = pd.read_csv(dataUrl)

abs = AppendBlobService(account_name='pyjobs', account_key='***')
abs.create_container("stocks", public_access = PublicAccess.Container)
abs.append_blob_from_text('stocks', 'msft', data[:25].to_csv(index=False))
existing = pd.read_csv(StringIO(abs.get_blob_to_text('stocks', 'msft').content))

ne = (data != existing).any(1)

the failing code is the final line. I was going through an article on determining differences between data frames.

I checked the dtypes on all columns, they appear to be the same. I also did a side by side output, I sorted teh axis, the indices, dropped the indices etc. Still get that bloody error.

Here is the output of the first row of existing and data

>>> existing[:1]
         Date       Open   High    Low  Close    Volume  Adj Close
0  2016-05-27  51.919998  52.32  51.77  52.32  17653700      52.32
>>> data[:1]
         Date       Open   High    Low  Close    Volume  Adj Close
0  2016-05-27  51.919998  52.32  51.77  52.32  17653700      52.32

Here is the exact error I receive:

>>> ne = (data != existing).any(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:Anaconda3libsite-packagespandascoreops.py", line 1169, in f
    return self._compare_frame(other, func, str_rep)
  File "C:Anaconda3libsite-packagespandascoreframe.py", line 3571, in _compare_frame
    raise ValueError('Can only compare identically-labeled '
ValueError: Can only compare identically-labeled DataFrame objects
Asked By: David Crook

||

Answers:

In order to get around this, you want to compare the underlying numpy arrays.

import pandas as pd

df1 = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'], index=['One', 'Two'])
df2 = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'], index=['one', 'two'])


df1.values == df2.values

array([[ True,  True],
       [ True,  True]], dtype=bool)
Answered By: piRSquared

Replicated with some fake data to achieve the end goal of removing duplicates. Note this is not the answer to the original question, but what the answer was to what I was attempting to do that caused the question.

b = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
                    index=[4, 5, 6, 7])


c = pd.DataFrame({'A': ['A7', 'A8', 'A9', 'A10', 'A11'],
                  'A': ['A7', 'A8', 'A9', 'A10', 'A11'],
                  'B': ['B7', 'B8', 'B9', 'B10', 'B11'],
                  'C': ['C7', 'C8', 'C9', 'C10', 'C11'],
                  'D': ['D7', 'D8', 'D9', 'D10', 'D11']},
                   index=[7, 8, 9, 10, 11])

result = pd.concat([b,c])
idx = np.unique(result["A"], return_index=True)[1]
result.iloc[idx].sort()
Answered By: David Crook

If you want to compare 2 Data Frames. Check-out flexible comparison in Pandas, using the methods like .eq(), .nq(), gt() and more… –> equal, not equal and greater then.

Example:

df['new_col'] = df.gt(df_1)

http://pandas.pydata.org/pandas-docs/stable/basics.html#flexible-comparisons

Answered By: Melroy van den Berg

I also faced the same issue and resolved it by sorting the index in both axis, before comparing two dataframes.

df1 = df1.sort_index(axis=1)
df2 = df2.sort_index(axis=1)
df1 = df1.sort_index()
df2 = df2.sort_index()
Answered By: SOWNDARIYA M
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.