Randomly remove 'x' elements from a list
Question:
I’d like to randomly remove a fraction of elements from a list without changing the order of the list.
Say I had some data and I wanted to remove 1/4 of them:
data = [1,2,3,4,5,6,7,8,9,10]
n = len(data) / 4
I’m thinking I need a loop to run through the data and delete a random element ‘n’ times? So something like:
for i in xrange(n):
random = np.randint(1,len(data))
del data[random]
My question is, is this the most ‘pythonic’ way of doing this? My list will be ~5000 elements long and I want to do this multiple times with different values of ‘n’.
Thanks!
Answers:
Sequential deleting is a bad idea since deletion in a list is O(n)
. Instead do something like this:
def delete_rand_items(items,n):
to_delete = set(random.sample(range(len(items)),n))
return [x for i,x in enumerate(items) if not i in to_delete]
You can use random.sample like this:
import random
a = [1,2,3,4,5,6,7,8,9,10]
no_elements_to_delete = len(a) // 4
no_elements_to_keep = len(a) - no_elements_to_delete
b = set(random.sample(a, no_elements_to_keep)) # the `if i in b` on the next line would benefit from b being a set for large lists
b = [i for i in a if i in b] # you need this to restore the order
print(len(a)) # 10
print(b) # [1, 2, 3, 4, 5, 8, 9, 10]
print(len(b)) # 8
Two notes on the above.
- You are not modifying the original list in place but you could.
- You are not actually deleting elements but rather keeping elements but it is the same thing (you just have to adjust the ratios)
- The drawback is the list-comprehension that restores the order of the elements
As @koalo says in the comments the above will not work properly if the elements in the original list are not unique. I could easily fix that but then my answer would be identical to the one posted by@JohnColeman. So if that might be the case just use his instead.
Is the order meaningful?
if not you can do something like:
shuffle(data)
data=data[:len(data)-n]
I suggest using numpy indexing as in
import numpy as np
data = np.array([1,2,3,4,5,6,7,8,9,10])
n = len(data)/4
indices = sorted(np.random.choice(len(data),len(data)-n,replace=False))
result = data[indices]
I think it will be more convenient this way:
import random
n = round(len(data) *0.3)
for i in range(n):
data.pop(random.randrange(len(data)))
I’d like to randomly remove a fraction of elements from a list without changing the order of the list.
Say I had some data and I wanted to remove 1/4 of them:
data = [1,2,3,4,5,6,7,8,9,10]
n = len(data) / 4
I’m thinking I need a loop to run through the data and delete a random element ‘n’ times? So something like:
for i in xrange(n):
random = np.randint(1,len(data))
del data[random]
My question is, is this the most ‘pythonic’ way of doing this? My list will be ~5000 elements long and I want to do this multiple times with different values of ‘n’.
Thanks!
Sequential deleting is a bad idea since deletion in a list is O(n)
. Instead do something like this:
def delete_rand_items(items,n):
to_delete = set(random.sample(range(len(items)),n))
return [x for i,x in enumerate(items) if not i in to_delete]
You can use random.sample like this:
import random
a = [1,2,3,4,5,6,7,8,9,10]
no_elements_to_delete = len(a) // 4
no_elements_to_keep = len(a) - no_elements_to_delete
b = set(random.sample(a, no_elements_to_keep)) # the `if i in b` on the next line would benefit from b being a set for large lists
b = [i for i in a if i in b] # you need this to restore the order
print(len(a)) # 10
print(b) # [1, 2, 3, 4, 5, 8, 9, 10]
print(len(b)) # 8
Two notes on the above.
- You are not modifying the original list in place but you could.
- You are not actually deleting elements but rather keeping elements but it is the same thing (you just have to adjust the ratios)
- The drawback is the list-comprehension that restores the order of the elements
As @koalo says in the comments the above will not work properly if the elements in the original list are not unique. I could easily fix that but then my answer would be identical to the one posted by@JohnColeman. So if that might be the case just use his instead.
Is the order meaningful?
if not you can do something like:
shuffle(data)
data=data[:len(data)-n]
I suggest using numpy indexing as in
import numpy as np
data = np.array([1,2,3,4,5,6,7,8,9,10])
n = len(data)/4
indices = sorted(np.random.choice(len(data),len(data)-n,replace=False))
result = data[indices]
I think it will be more convenient this way:
import random
n = round(len(data) *0.3)
for i in range(n):
data.pop(random.randrange(len(data)))