Update index after sorting data-frame
Question:
Take the following data-frame:
x = np.tile(np.arange(3),3)
y = np.repeat(np.arange(3),3)
df = pd.DataFrame({"x": x, "y": y})
x y
0 0 0
1 1 0
2 2 0
3 0 1
4 1 1
5 2 1
6 0 2
7 1 2
8 2 2
I need to sort it by x
first, and only second by y
:
df2 = df.sort(["x", "y"])
x y
0 0 0
3 0 1
6 0 2
1 1 0
4 1 1
7 1 2
2 2 0
5 2 1
8 2 2
How can I change the index such that it is ascending again. I.e. how do I get this:
x y
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
I have tried the following. Unfortunately, it doesn’t change the index at all:
df2.reindex(np.arange(len(df2.index)))
Answers:
You can reset the index using reset_index
to get back a default index of 0, 1, 2, …, n-1 (and use drop=True
to indicate you want to drop the existing index instead of adding it as an additional column to your dataframe):
In [19]: df2 = df2.reset_index(drop=True)
In [20]: df2
Out[20]:
x y
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
You can set new indices by using set_index
:
df2.set_index(np.arange(len(df2.index)))
Output:
x y
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
df.sort()
is deprecated, use df.sort_values(...)
: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html
Then follow joris’ answer by doing df.reset_index(drop=True)
Since pandas 1.0.0 df.sort_values
has a new parameter ignore_index
which does exactly what you need:
In [1]: df2 = df.sort_values(by=['x','y'],ignore_index=True)
In [2]: df2
Out[2]:
x y
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
The following works!
-
If you want to change the existing dataframe itself, you may directly use
df.sort_values(by=['col1'], inplace=True)
df.reset_index(drop=True, inplace=True)
df
>> col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
5 C 4 3 F
4 D 7 2 e
3 NaN 8 4 D
-
Else, if you don’t want to change the existing dataframe but want to store the sorted dataframe into another variable separately, you may use:
df_sorted = df.sort_values(by=['col1']).reset_index(drop=True)
df_sorted
>> col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
3 C 4 3 F
4 D 7 2 e
5 NaN 8 4 D
df
>> col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
3 NaN 8 4 D
4 D 7 2 e
5 C 4 3 F
Take the following data-frame:
x = np.tile(np.arange(3),3)
y = np.repeat(np.arange(3),3)
df = pd.DataFrame({"x": x, "y": y})
x y
0 0 0
1 1 0
2 2 0
3 0 1
4 1 1
5 2 1
6 0 2
7 1 2
8 2 2
I need to sort it by x
first, and only second by y
:
df2 = df.sort(["x", "y"])
x y
0 0 0
3 0 1
6 0 2
1 1 0
4 1 1
7 1 2
2 2 0
5 2 1
8 2 2
How can I change the index such that it is ascending again. I.e. how do I get this:
x y
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
I have tried the following. Unfortunately, it doesn’t change the index at all:
df2.reindex(np.arange(len(df2.index)))
You can reset the index using reset_index
to get back a default index of 0, 1, 2, …, n-1 (and use drop=True
to indicate you want to drop the existing index instead of adding it as an additional column to your dataframe):
In [19]: df2 = df2.reset_index(drop=True)
In [20]: df2
Out[20]:
x y
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
You can set new indices by using set_index
:
df2.set_index(np.arange(len(df2.index)))
Output:
x y
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
df.sort()
is deprecated, use df.sort_values(...)
: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html
Then follow joris’ answer by doing df.reset_index(drop=True)
Since pandas 1.0.0 df.sort_values
has a new parameter ignore_index
which does exactly what you need:
In [1]: df2 = df.sort_values(by=['x','y'],ignore_index=True)
In [2]: df2
Out[2]:
x y
0 0 0
1 0 1
2 0 2
3 1 0
4 1 1
5 1 2
6 2 0
7 2 1
8 2 2
The following works!
-
If you want to change the existing dataframe itself, you may directly use
df.sort_values(by=['col1'], inplace=True) df.reset_index(drop=True, inplace=True) df >> col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 5 C 4 3 F 4 D 7 2 e 3 NaN 8 4 D
-
Else, if you don’t want to change the existing dataframe but want to store the sorted dataframe into another variable separately, you may use:
df_sorted = df.sort_values(by=['col1']).reset_index(drop=True) df_sorted >> col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 3 C 4 3 F 4 D 7 2 e 5 NaN 8 4 D df >> col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 3 NaN 8 4 D 4 D 7 2 e 5 C 4 3 F