Shuffle two list at once with same order

Question

I’m using the nltk library’s movie_reviews corpus which contains a large number of documents. My task is get predictive performance of these reviews with pre-processing of the data and without pre-processing. But there is problem, in lists documents and documents2 I have the same documents and I need shuffle them in order to keep same order in both lists. I cannot shuffle them separately because each time I shuffle the list, I get other results. That is why I need to shuffle the at once with same order because I need compare them in the end (it depends on order). I’m using python 2.7

Example (in real are strings tokenized, but it is not relative):

documents = [(['plot : two teen couples go to a church party , '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['they get into an accident . '], 'neg'),
             (['one of the guys dies'], 'neg')]

documents2 = [(['plot two teen couples church party'], 'neg'),
              (['drink then drive . '], 'pos'),
              (['they get accident . '], 'neg'),
              (['one guys dies'], 'neg')]

And I need get this result after shuffle both lists:

documents = [(['one of the guys dies'], 'neg'),
             (['they get into an accident . '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['plot : two teen couples go to a church party , '], 'neg')]

documents2 = [(['one guys dies'], 'neg'),
              (['they get accident . '], 'neg'),
              (['drink then drive . '], 'pos'),
              (['plot two teen couples church party'], 'neg')]

I have this code:

def cleanDoc(doc):
    stopset = set(stopwords.words('english'))
    stemmer = nltk.PorterStemmer()
    clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]
    final = [stemmer.stem(word) for word in clean]
    return final

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

Asked By: Jaroslav Klimčík

||

Source

Answer 1

You can do it as:

import random

a = ['a', 'b', 'c']
b = [1, 2, 3]

c = list(zip(a, b))

random.shuffle(c)

a, b = zip(*c)

print a
print b

[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

Of course, this was an example with simpler lists, but the adaptation will be the same for your case.

Answered By: sshashank124

Answer 2

You can use the second argument of the shuffle function to fix the order of shuffling.

Specifically, you can pass the second argument of shuffle function a zero argument function which returns a value in [0, 1). The return value of this function fixes the order of shuffling. (By default i.e. if you do not pass any function as the second argument, it uses the function random.random(). You can see it at line 277 here.)

This example illustrates what I described:

import random

a = ['a', 'b', 'c', 'd', 'e']
b = [1, 2, 3, 4, 5]

r = random.random()            # randomly generating a real in [0,1)
random.shuffle(a, lambda : r)  # lambda : r is an unary function which returns r
random.shuffle(b, lambda : r)  # using the same function as used in prev line so that shuffling order is same

print a
print b

Output:

['e', 'c', 'd', 'a', 'b']
[5, 3, 4, 1, 2]

Answered By: Kundan Kumar

Answer 3

Shuffle an arbitray number of lists simultaneously.

from random import shuffle

def shuffle_list(*ls):
  l =list(zip(*ls))

  shuffle(l)
  return zip(*l)

a = [0,1,2,3,4]
b = [5,6,7,8,9]

a1,b1 = shuffle_list(a,b)
print(a1,b1)

a = [0,1,2,3,4]
b = [5,6,7,8,9]
c = [10,11,12,13,14]
a1,b1,c1 = shuffle_list(a,b,c)
print(a1,b1,c1)

Output:

$ (0, 2, 4, 3, 1) (5, 7, 9, 8, 6)
$ (4, 3, 0, 2, 1) (9, 8, 5, 7, 6) (14, 13, 10, 12, 11)

Note:
objects returned by shuffle_list() are tuples.

P.S.
shuffle_list() can also be applied to numpy.array()

a = np.array([1,2,3])
b = np.array([4,5,6])

a1,b1 = shuffle_list(a,b)
print(a1,b1)

Output:

$ (3, 1, 2) (6, 4, 5)

Answered By: Lion Lai

Answer 4

I get a easy way to do this

import numpy as np
a = np.array([0,1,2,3,4])
b = np.array([5,6,7,8,9])

indices = np.arange(a.shape[0])
np.random.shuffle(indices)

a = a[indices]
b = b[indices]
# a, array([3, 4, 1, 2, 0])
# b, array([8, 9, 6, 7, 5])

Answered By: hua wei

Answer 5

from sklearn.utils import shuffle

a = ['a', 'b', 'c','d','e']
b = [1, 2, 3, 4, 5]

a_shuffled, b_shuffled = shuffle(np.array(a), np.array(b))
print(a_shuffled, b_shuffled)

#random output
#['e' 'c' 'b' 'd' 'a'] [5 3 2 4 1]

Answered By: YScharf

Answer 6

Easy and fast way to do this is to use random.seed() with random.shuffle() . It lets you generate same random order many times you want.
It will look like this:

a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
seed = random.random()
random.seed(seed)
a.shuffle()
random.seed(seed)
b.shuffle()
print(a)
print(b)

>>[3, 1, 4, 2, 5]
>>[8, 6, 9, 7, 10]

This also works when you can’t work with both lists at the same time, because of memory problems.

Answered By: Boris

Answer 7

You can store the order of the values in a variable, then sort the arrays simultaneously:

array1 = [1, 2, 3, 4, 5]
array2 = ["one", "two", "three", "four", "five"]

order = range(len(array1))
random.shuffle(order)

newarray1 = []
newarray2 = []
for x in range(len(order)):
    newarray1.append(array1[order[x]])
    newarray2.append(array2[order[x]])

print newarray1, newarray2

Answered By: D_00

Answer 8

This works as well:

import numpy as np

a = ['a', 'b', 'c']
b = [1, 2, 3]

rng = np.random.default_rng()

state = rng.bit_generator.state
rng.shuffle(a)
# use same seeds for a & b!
rng.bit_generator.state = state # set state to same state as before
rng.shuffle(b)

print(a)
print(b)

Output:

['b', 'a', 'c']
[2, 1, 3]

Answered By: T-Dog

Shuffle two list at once with same order

Question:

Answers: