# Shuffling Several DataFrames Together

## Question:

Is it possible to shuffle several DataFrames together?

For example I have a DataFrame `df1` and a DataFrame `df2`. I want to shuffle the rows randomly, but for both DataFrames in the same way.

Example

`df1`:

``````|___|_______|
| 1 |  ...  |
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |
``````

`df2`:

``````|___|_______|
| 1 |  ...  |
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |
``````

After shuffling a possible order for both DataFrames could be:

``````|___|_______|
| 2 |  ...  |
| 3 |  ...  |
| 4 |  ...  |
| 1 |  ...  |
``````

I think you can double `reindex` with applying `numpy.random.permutation` to `index`, but is necessary both `DataFrame`s have same length and same unique index values:

``````df1 = pd.DataFrame({'a':range(5)})
print (df1)
a
0  0
1  1
2  2
3  3
4  4

df2 = pd.DataFrame({'a':range(5)})
print (df2)
a
0  0
1  1
2  2
3  3
4  4

idx = np.random.permutation(df1.index)
print (df1.reindex(idx))
a
2  2
4  4
1  1
3  3
0  0

print (df2.reindex(idx))
a
2  2
4  4
1  1
3  3
0  0
``````

Alternative with `reindex_axis`:

``````print (df1.reindex_axis(idx, axis=0))
print (df2.reindex_axis(idx, axis=0))
``````
``````x1, x2, y1, y2 = train_test_split(x, y, shuffle = True)
x3 = pd.concat([x1, x2])
y3 = pd.concat([y1, y2])
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.