Pandas: Reset index for non unique index
Question:
I have a dataframe like this:
Name
Value
0
x1
0.20
0
x1
0.40
0
x1
0.34
3
x2
0.12
3
x2
0.13
4
x3
0.19
4
x3
0.23
df = pd.DataFrame([["x1", 0.2], ["x1", 0.4], ["x1", 0.34], ["x2", 0.12], ["x2", 0.13], ["x3", 0.19], ["x3", 0.239]], index = [0, 0, 0, 3, 3, 4, 4], columns= ["Name", "Value"])
I would like to reset the index to get the following DataFrame:
Name
Value
0
x1
0.20
0
x1
0.40
0
x1
0.34
1
x2
0.12
1
x2
0.13
2
x3
0.19
2
x3
0.23
df = pd.DataFrame([["x1", 0.2], ["x1", 0.4], ["x1", 0.34], ["x2", 0.12], ["x2", 0.13], ["x3", 0.19], ["x3", 0.239]], index = [0, 0, 0, 1, 1, 2, 2], columns= ["Name", "Value"])
As this does not seem possible with the reset_index() function, what would be best way to achieve this reindexing?
The DataFrame results from using explode() and filtering NaN values. I want to reset the index after filtering to make it easier to loop over the exploded values later on. Notice that I don’t want to use the "Name" column as the index, as indexing by string seems to be extremly slow for larger data sets.
Answers:
Use:
df = df.set_axis(pd.factorize(df['Name'])[0].tolist())
Or:
df.index = pd.factorize(df['Name'])[0]
Name Value
0 x1 0.200
0 x1 0.400
0 x1 0.340
1 x2 0.120
1 x2 0.130
2 x3 0.190
2 x3 0.239
I have a dataframe like this:
Name | Value | |
---|---|---|
0 | x1 | 0.20 |
0 | x1 | 0.40 |
0 | x1 | 0.34 |
3 | x2 | 0.12 |
3 | x2 | 0.13 |
4 | x3 | 0.19 |
4 | x3 | 0.23 |
df = pd.DataFrame([["x1", 0.2], ["x1", 0.4], ["x1", 0.34], ["x2", 0.12], ["x2", 0.13], ["x3", 0.19], ["x3", 0.239]], index = [0, 0, 0, 3, 3, 4, 4], columns= ["Name", "Value"])
I would like to reset the index to get the following DataFrame:
Name | Value | |
---|---|---|
0 | x1 | 0.20 |
0 | x1 | 0.40 |
0 | x1 | 0.34 |
1 | x2 | 0.12 |
1 | x2 | 0.13 |
2 | x3 | 0.19 |
2 | x3 | 0.23 |
df = pd.DataFrame([["x1", 0.2], ["x1", 0.4], ["x1", 0.34], ["x2", 0.12], ["x2", 0.13], ["x3", 0.19], ["x3", 0.239]], index = [0, 0, 0, 1, 1, 2, 2], columns= ["Name", "Value"])
As this does not seem possible with the reset_index() function, what would be best way to achieve this reindexing?
The DataFrame results from using explode() and filtering NaN values. I want to reset the index after filtering to make it easier to loop over the exploded values later on. Notice that I don’t want to use the "Name" column as the index, as indexing by string seems to be extremly slow for larger data sets.
Use:
df = df.set_axis(pd.factorize(df['Name'])[0].tolist())
Or:
df.index = pd.factorize(df['Name'])[0]
Name Value
0 x1 0.200
0 x1 0.400
0 x1 0.340
1 x2 0.120
1 x2 0.130
2 x3 0.190
2 x3 0.239