Join two tables on ID where table one has duplicates and table two does not
Question:
I have two tables, Table A[‘id’, ‘col1’] and Table B[‘id’, ‘col3’]. Table A has duplicate values in ‘id’ column and its important that those duplicate row stay as they are but Table B has single pair for ‘id’ and ‘col3’. I want to use python to join the two tables.
Table A
id
col1
1
Apple
1
Mango
2
Banana
Table B
id
col3
1
Eat
2
Drink
Final output
id
col1
col3
1
Apple
Eat
1
Mango
Eat
2
Banana
Drink
I tried to use merge and concat but did not get the desired result.
Answers:
like this:
tableA = tableA.merge(right=tableB, how='left', on='id')
Here’s a way to use merge
to do it:
A.merge(B, how='left') )
Explanation:
- In the docs for
merge()
, when the how
argument is 'left'
:
- use only keys from left frame, similar to a SQL left outer join; preserve key order.
- Also from the docs, when the
on
argument is not specified, it takes the default value of None, and on
is interpreted as follows:
- Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
Output:
id col1 col3
0 1 Apple Eat
1 1 Mango Eat
2 2 Banana Drink
I have two tables, Table A[‘id’, ‘col1’] and Table B[‘id’, ‘col3’]. Table A has duplicate values in ‘id’ column and its important that those duplicate row stay as they are but Table B has single pair for ‘id’ and ‘col3’. I want to use python to join the two tables.
Table A
id | col1 |
---|---|
1 | Apple |
1 | Mango |
2 | Banana |
Table B
id | col3 |
---|---|
1 | Eat |
2 | Drink |
Final output
id | col1 | col3 |
---|---|---|
1 | Apple | Eat |
1 | Mango | Eat |
2 | Banana | Drink |
I tried to use merge and concat but did not get the desired result.
like this:
tableA = tableA.merge(right=tableB, how='left', on='id')
Here’s a way to use merge
to do it:
A.merge(B, how='left') )
Explanation:
- In the docs for
merge()
, when thehow
argument is'left'
:- use only keys from left frame, similar to a SQL left outer join; preserve key order.
- Also from the docs, when the
on
argument is not specified, it takes the default value of None, andon
is interpreted as follows:- Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
Output:
id col1 col3
0 1 Apple Eat
1 1 Mango Eat
2 2 Banana Drink