Randomly remove 'x' elements from a list

Question:

I’d like to randomly remove a fraction of elements from a list without changing the order of the list.

Say I had some data and I wanted to remove 1/4 of them:

data = [1,2,3,4,5,6,7,8,9,10]
n    = len(data) / 4

I’m thinking I need a loop to run through the data and delete a random element ‘n’ times? So something like:

for i in xrange(n):
    random = np.randint(1,len(data))
    del data[random]

My question is, is this the most ‘pythonic’ way of doing this? My list will be ~5000 elements long and I want to do this multiple times with different values of ‘n’.

Thanks!

Asked By: rh1990

||

Answers:

Sequential deleting is a bad idea since deletion in a list is O(n). Instead do something like this:

def delete_rand_items(items,n):
    to_delete = set(random.sample(range(len(items)),n))
    return [x for i,x in enumerate(items) if not i in to_delete]
Answered By: John Coleman

You can use random.sample like this:

import random

a = [1,2,3,4,5,6,7,8,9,10]

no_elements_to_delete = len(a) // 4
no_elements_to_keep = len(a) - no_elements_to_delete
b = set(random.sample(a, no_elements_to_keep))  # the `if i in b` on the next line would benefit from b being a set for large lists
b = [i for i in a if i in b]  # you need this to restore the order
print(len(a))  # 10
print(b)       # [1, 2, 3, 4, 5, 8, 9, 10]
print(len(b))  # 8

Two notes on the above.

  1. You are not modifying the original list in place but you could.
  2. You are not actually deleting elements but rather keeping elements but it is the same thing (you just have to adjust the ratios)
  3. The drawback is the list-comprehension that restores the order of the elements

As @koalo says in the comments the above will not work properly if the elements in the original list are not unique. I could easily fix that but then my answer would be identical to the one posted by@JohnColeman. So if that might be the case just use his instead.

Answered By: Ma0

Is the order meaningful?
if not you can do something like:

shuffle(data)
data=data[:len(data)-n]
Answered By: Binyamin Even

I suggest using numpy indexing as in

import numpy as np
data = np.array([1,2,3,4,5,6,7,8,9,10])
n = len(data)/4
indices = sorted(np.random.choice(len(data),len(data)-n,replace=False))
result = data[indices]
Answered By: koalo

I think it will be more convenient this way:

import random
n = round(len(data) *0.3)
for i in range(n):
    data.pop(random.randrange(len(data)))
Answered By: N01E1se
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.