Shuffling Several DataFrames Together

Question:

Is it possible to shuffle several DataFrames together?

For example I have a DataFrame df1 and a DataFrame df2. I want to shuffle the rows randomly, but for both DataFrames in the same way.

Example

df1:

|___|_______|
| 1 |  ...  |
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |

df2:

|___|_______|
| 1 |  ...  |
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |

After shuffling a possible order for both DataFrames could be:

|___|_______|
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |
| 1 |  ...  |
Asked By: ScientiaEtVeritas

||

Answers:

I think you can double reindex with applying numpy.random.permutation to index, but is necessary both DataFrames have same length and same unique index values:

df1 = pd.DataFrame({'a':range(5)})
print (df1)
   a
0  0
1  1
2  2
3  3
4  4

df2 = pd.DataFrame({'a':range(5)})
print (df2)
   a
0  0
1  1
2  2
3  3
4  4

idx = np.random.permutation(df1.index)
print (df1.reindex(idx))
   a
2  2
4  4
1  1
3  3
0  0

print (df2.reindex(idx))
   a
2  2
4  4
1  1
3  3
0  0

Alternative with reindex_axis:

print (df1.reindex_axis(idx, axis=0))
print (df2.reindex_axis(idx, axis=0))
Answered By: jezrael
x1, x2, y1, y2 = train_test_split(x, y, shuffle = True)
x3 = pd.concat([x1, x2])
y3 = pd.concat([y1, y2])
Answered By: itsergiu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.