What is the use of reset_index() in pandas?

Question:

While reading this article, I came across this statement.

order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()

Other than reset_index() method call, everything else is clear to me.
My question is what will happen if I don’t call reset_index() considering the given below sequence?

order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()
df_1 = df.merge(order_total)
df_1["Percent_of_Order"] = df_1["ext price"] / df_1["Order_Total"]

I tried to understand about this method from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html, but couldn’t understand what does it mean to reset the index of a dataframe.

Asked By: Saurav Sahu

||

Answers:

A simplified explanation is that;
reset_index() takes the current index, and places it in column ‘index’. Then it recreates a new ‘linear’ index for the data-set.

df=pd.DataFrame([20,30,40,50],index=[2,3,4,5])

    0
2  20
3  30
4  40
5  50

df.reset_index()

   index   0
0      2  20
1      3  30
2      4  40
3      5  50
Answered By: visibleman

I think better here is use GroupBy.transform for new Series with same size like original DataFrame filled by aggregate values, so merge is not necessary:

df_1 = pd.DataFrame({
         'A':list('abcdef'),
         'ext price':[5,3,6,9,2,4],
         'order':list('aaabbb')
})


order_total1 = df_1.groupby('order')["ext price"].transform('sum')
df_1["Percent_of_Order"] = df_1["ext price"] / order_total1
print (df_1)
   A  ext price order  Percent_of_Order
0  a          5     a          0.357143
1  b          3     a          0.214286
2  c          6     a          0.428571
3  d          9     b          0.600000
4  e          2     b          0.133333
5  f          4     b          0.266667

My question is what will happen if I don’t call reset_index() considering the sequence?

Here is Series before reset_index(), so after reset_index is converting Series to 2 columns DataFrame, first column is called by index name and second column by Series name.

order_total = df_1.groupby('order')["ext price"].sum().rename("Order_Total")
print (order_total)
order
a    14
b    15
Name: Order_Total, dtype: int64

print (type(order_total))
<class 'pandas.core.series.Series'>

print (order_total.name)
Order_Total

print (order_total.index.name)
order

print (order_total.reset_index())
  order  Order_Total
0     a           14
1     b           15

Reason why is necessry in your code to 2 columns DataFrame is no parameter in merge. It means it use parameter on by intersection of common columns names between both DataFrames, here order column.

Answered By: jezrael

Reset Index will create index starting from 0 and remove if there is any column set as index.

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "Hello Kitty",
            "Hello Puppy",
            "It is an Helloexample",
            "for stackoverflow",
            "Hello World",
        ],
    }
)
newdf = df.set_index('ID')

print(newdf.reset_index())

Output Before reset_index():

                     name
ID                       
1             Hello Kitty
2             Hello Puppy
3   It is an Helloexample
4       for stackoverflow
5             Hello World

Output after reset_index():

   ID                   name
0   1            Hello Kitty
1   2            Hello Puppy
2   3  It is an Helloexample
3   4      for stackoverflow
4   5            Hello World
Answered By: LOrD_ARaGOrN

To answer your question:

My question is what will happen if I don’t call reset_index() considering the sequence?

You will have a multi-index formed by the keys you have applied group-by statement on.
for eg- ‘order’ in your case.
Specific to the article, difference in indices of two dataframes may cause wrong merges (done after the group-by statement).

Hence, a reset-index is needed to perform the correct merge.

Answered By: Akash sharma
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.