Intersection of two lists including duplicates?
Question:
>>> a = [1,1,1,2,3,4,4]
>>> b = [1,1,2,3,3,3,4]
[1,1,2,3,4]
Please note this is not the same question as this:
Python intersection of two lists keeping duplicates
Because even though there are three 1s in list a, there are only two in list b so the result should only have two.
Answers:
You can use collections.Counter
for this, which will provide the lowest count found in either list for each element when you take the intersection.
from collections import Counter
c = list((Counter(a) & Counter(b)).elements())
Outputs:
[1, 1, 2, 3, 4]
This will do:
from itertools import chain
list(chain.from_iterable([(val,)*min(a.count(val), b.count(val)) for val in (set(a) & set(b))]))
Gives:
[1, 1, 2, 3, 4]
This should also works.
a = [1, 1, 1, 2, 3, 4, 4]
b = [1, 1, 2, 3, 3, 3, 4]
c = []
i, j = 0, 0
while i < len(a) and j < len(b):
if a[i] == b[j]:
c.append(a[i])
i += 1
j += 1
elif a[i] > b[j]:
j += 1
else:
i += 1
print(c) # [1, 1, 2, 3, 4]
This should also work:
def list_intersect(lisA, lisB):
""" Finds the intersection of 2 lists including common duplicates"""
Iset = set(lisA).intersection(set(lisB))
Ilis = []
for i in Iset:
num = min(lisA.count(i), lisB.count(i))
for j in range(num):
Ilis.append(i)
return Ilis
The accepted solution posted using Counter is simple, but I think this approach using a dictionary will work too and can be faster — even on lists that aren’t ordered (that requirement wasn’t really mentioned, but at least one of the other solutions assumes that is the case).
a = [1, 1, 1, 2, 3, 4, 4]
b = [1, 1, 2, 3, 3, 3, 4]
def intersect(nums1, nums2):
match = {}
for x in nums1:
if x in match:
match[x] += 1
else:
match[x] = 1
i = []
for x in nums2:
if x in match:
i.append(x)
match[x] -= 1
if match[x] == 0:
del match[x]
return i
def intersect2(nums1, nums2):
return list((Counter(nums1) & Counter(nums2)).elements())
timeit intersect(a,b)
100000 loops, best of 3: 3.8 µs per loop
timeit intersect2(a,b)
The slowest run took 4.90 times longer than the fastest. This could mean
that an intermediate result is being cached.
10000 loops, best of 3: 20.4 µs per loop
I tested with lists of random ints of size 1000 and 10000 and it was faster there too.
a = [random.randint(0,100) for r in xrange(10000)]
b = [random.randint(0,100) for r in xrange(1000)]
timeit intersect(a,b)
100 loops, best of 3: 2.35 ms per loop
timeit intersect2(a,b)
100 loops, best of 3: 4.2 ms per loop
And larger lists that would have more common elements
a = [random.randint(0,10) for r in xrange(10000)]
b = [random.randint(0,10) for r in xrange(1000)]
timeit intersect(a,b)
100 loops, best of 3: 2.07 ms per loop
timeit intersect2(a,b)
100 loops, best of 3: 3.41 ms per loop
Simple with no additional imports and easy to debug 🙂
Disadvantage: The value of list b is changed. Work on a copy of b if you don’t want to change b.
c = list()
for x in a:
if x in b:
b.remove(x)
c.append(x)
>>> a = [1,1,1,2,3,4,4]
>>> b = [1,1,2,3,3,3,4]
[1,1,2,3,4]
Please note this is not the same question as this:
Python intersection of two lists keeping duplicates
Because even though there are three 1s in list a, there are only two in list b so the result should only have two.
You can use collections.Counter
for this, which will provide the lowest count found in either list for each element when you take the intersection.
from collections import Counter
c = list((Counter(a) & Counter(b)).elements())
Outputs:
[1, 1, 2, 3, 4]
This will do:
from itertools import chain
list(chain.from_iterable([(val,)*min(a.count(val), b.count(val)) for val in (set(a) & set(b))]))
Gives:
[1, 1, 2, 3, 4]
This should also works.
a = [1, 1, 1, 2, 3, 4, 4]
b = [1, 1, 2, 3, 3, 3, 4]
c = []
i, j = 0, 0
while i < len(a) and j < len(b):
if a[i] == b[j]:
c.append(a[i])
i += 1
j += 1
elif a[i] > b[j]:
j += 1
else:
i += 1
print(c) # [1, 1, 2, 3, 4]
This should also work:
def list_intersect(lisA, lisB):
""" Finds the intersection of 2 lists including common duplicates"""
Iset = set(lisA).intersection(set(lisB))
Ilis = []
for i in Iset:
num = min(lisA.count(i), lisB.count(i))
for j in range(num):
Ilis.append(i)
return Ilis
The accepted solution posted using Counter is simple, but I think this approach using a dictionary will work too and can be faster — even on lists that aren’t ordered (that requirement wasn’t really mentioned, but at least one of the other solutions assumes that is the case).
a = [1, 1, 1, 2, 3, 4, 4]
b = [1, 1, 2, 3, 3, 3, 4]
def intersect(nums1, nums2):
match = {}
for x in nums1:
if x in match:
match[x] += 1
else:
match[x] = 1
i = []
for x in nums2:
if x in match:
i.append(x)
match[x] -= 1
if match[x] == 0:
del match[x]
return i
def intersect2(nums1, nums2):
return list((Counter(nums1) & Counter(nums2)).elements())
timeit intersect(a,b)
100000 loops, best of 3: 3.8 µs per loop
timeit intersect2(a,b)
The slowest run took 4.90 times longer than the fastest. This could mean
that an intermediate result is being cached.
10000 loops, best of 3: 20.4 µs per loop
I tested with lists of random ints of size 1000 and 10000 and it was faster there too.
a = [random.randint(0,100) for r in xrange(10000)]
b = [random.randint(0,100) for r in xrange(1000)]
timeit intersect(a,b)
100 loops, best of 3: 2.35 ms per loop
timeit intersect2(a,b)
100 loops, best of 3: 4.2 ms per loop
And larger lists that would have more common elements
a = [random.randint(0,10) for r in xrange(10000)]
b = [random.randint(0,10) for r in xrange(1000)]
timeit intersect(a,b)
100 loops, best of 3: 2.07 ms per loop
timeit intersect2(a,b)
100 loops, best of 3: 3.41 ms per loop
Simple with no additional imports and easy to debug 🙂
Disadvantage: The value of list b is changed. Work on a copy of b if you don’t want to change b.
c = list()
for x in a:
if x in b:
b.remove(x)
c.append(x)