SQL Server: How to replicate pandas merge?

Question

How can I replicate Pandas merge in SQL server?

I want to do this:

# merge and filter out rows that are in "both" dataframes

df1 = pd.DataFrame([
            ['A', 1, 'c', 'a'],
            ['A', 2, 'c', 'a'],
            ['B', 2, 'c', 'a'],
            ['B', 3, 'c', 'a'],
            ['C', 3, 'c', 'a'],
            ['C', 4, 'c', 'a'],
            ['D', 3, 'c', 'a']
            ],
        columns = ['ID', 'Period', 'Pivot', 'Group'])

df2 = pd.DataFrame([
            ['A', 1, 'c', 'a'],
            ['A', 2, 'c', 'a'],
            ['B', 2, 'c', 'a'],
            ['B', 3, 'c', 'a'],
            ['C', 3, 'c', 'a'],
            ['C', 4, 'd', 'a'],
            ['D', 3, 'd', 'a']
            ],
        columns = ['ID', 'Period', 'Pivot', 'Group'])


out = df1.merge(df2, how='outer', left_on=['ID', 'Period', 'Pivot', 'Group'], right_on=['ID', 'Period', 'Pivot', 'Group'], indicator=True).query('_merge != "both"')

What I have tried to do is implement a variant of this:

https://stackoverflow.com/a/511022/6534818

SELECT a.SelfJoinTableID
FROM   dbo.SelfJoinTable a
       INNER JOIN dbo.SelfJoinTable b
         ON a.SelfJoinTableID = b.SelfJoinTableID
       INNER JOIN dbo.SelfJoinTable c
         ON a.SelfJoinTableID = c.SelfJoinTableID
WHERE  a.Status = 'Status to filter a'
       AND b.Status = 'Status to filter b'
       AND c.Status = 'Status to filter c'

But it does return what I get in Pandas.

Asked By: John Stud

||

Source

Answer 1

It looks like you are trying to select the rows that are in df1 or df2, but not both. On SQL Server, you can use the UNION, EXCEPT, and INTERSECT operators like this:

(SELECT * FROM TableA
UNION
SELECT * FROM TableB)
EXCEPT
(SELECT * FROM TableA
 INTERSECT
 SELECT * FROM TableB)

The first three rows performs the set union of all rows in TableA and TableB. The last three rows performs the set intersection of TableA and TableB (i.e., only the rows that appear in both). Finally, the EXCEPT operator removes the latter group from the former.

See: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql

And: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql

Answered By: peds

SQL Server: How to replicate pandas merge?

Question:

Answers: