Right way to reverse a pandas DataFrame?
Question:
Here is my code:
import pandas as pd
data = pd.DataFrame({'Odd':[1,3,5,6,7,9], 'Even':[0,2,4,6,8,10]})
for i in reversed(data):
print(data['Odd'], data['Even'])
When I run this code, i get the following error:
Traceback (most recent call last):
File "C:Python33libsite-packagespandascoregeneric.py", line 665, in _get_item_cache
return cache[item]
KeyError: 5
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:Users*****Documents******************.py", line 5, in <module>
for i in reversed(data):
File "C:Python33libsite-packagespandascoreframe.py", line 2003, in __getitem__
return self._get_item_cache(key)
File "C:Python33libsite-packagespandascoregeneric.py", line 667, in _get_item_cache
values = self._data.get(item)
File "C:Python33libsite-packagespandascoreinternals.py", line 1656, in get
_, block = self._find_block(item)
File "C:Python33libsite-packagespandascoreinternals.py", line 1936, in _find_block
self._check_have(item)
File "C:Python33libsite-packagespandascoreinternals.py", line 1943, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: 'no item named 5'
Why am I getting this error?
How can I fix that?
What is the right way to reverse pandas.DataFrame
?
Answers:
data.reindex(index=data.index[::-1])
or simply:
data.iloc[::-1]
will reverse your data frame, if you want to have a for
loop which goes from down to up you may do:
for idx in reversed(data.index):
print(idx, data.loc[idx, 'Even'], data.loc[idx, 'Odd'])
or
for idx in reversed(data.index):
print(idx, data.Even[idx], data.Odd[idx])
You are getting an error because reversed
first calls data.__len__()
which returns 6. Then it tries to call data[j - 1]
for j
in range(6, 0, -1)
, and the first call would be data[5]
; but in pandas dataframe data[5]
means column 5, and there is no column 5 so it will throw an exception. ( see docs )
You can reverse the rows in an even simpler way:
df[::-1]
This works:
for i,r in data[::-1].iterrows():
print(r['Odd'], r['Even'])
None of the existing answers resets the index after reversing the dataframe.
For this, do the following:
data[::-1].reset_index()
Here’s a utility function that also removes the old index column, as per @Tim’s comment:
def reset_my_index(df):
res = df[::-1].reset_index(drop=True)
return(res)
Simply pass your dataframe into the function
One way to do this if dealing with sorted range index is:
data = data.sort_index(ascending=False)
This approach has the benefits of (1) being a single line, (2) not requiring a utility function, and most importantly (3) not actually changing any of the data in the dataframe.
Caveat: this works by sorting the index in descending order and so may not always be appropriate or generalize for any given Dataframe.
What is the right way to reverse a pandas DataFrame?
TL;DR: df[::-1]
This is the best method for reversing a DataFrame, because 1) it is constant run time, i.e. O(1) 2) it’s a single operation, and 3) concise/readable (assuming familiarity with slice notation).
Long Version
I’ve found the ol’ slicing trick df[::-1]
(or the equivalent df.loc[::-1]
1) to be the most concise and idiomatic way of reversing a DataFrame. This mirrors the python list reversal syntax lst[::-1]
and is clear in its intent. With the loc
syntax, you are also able to slice columns if required, so it is a bit more flexible.
Some points to consider while handling the index:
-
"what if I want to reverse the index as well?"
- you’re already done.
df[::-1]
reverses both the index and values.
-
"what if I want to drop the index from the result?"
- you can call
.reset_index(drop=True)
at the end.
-
"what if I want to keep the index untouched (IOW, only reverse the data, not the index)?"
- this is somewhat unconventional because it implies the index isn’t really relevant to the data. Perhaps consider removing it entirely? Although what you’re asking for can technically be achieved using either
df[:] = df[::-1]
which creates an in-place update to df
, or df.loc[::-1].set_index(df.index)
, which returns a copy.
1: df.loc[::-1]
and df.iloc[::-1]
are equivalent since the slicing syntax remains the same, whether you’re reversing by position (iloc
) or label (loc
).
The Proof is in the Pudding
X-axis represents the dataset size. Y-axis represents time taken to reverse. No method scales as well as the slicing trick, it’s all the way at the bottom of the graph. Benchmarking code for reference, plots generated using perfplot.
Comments on other solutions
-
df.reindex(index=df.index[::-1])
is clearly a popular solution, but on first glance, how obvious is it to an unfamiliar reader that this code is "reversing a DataFrame"? Additionally, this is reversing the index, then using that intermediate result to reindex
, so this is essentially a TWO step operation (when it could’ve been just one).
-
df.sort_index(ascending=False)
may work in most cases where you have a simple range index, but this assumes your index was sorted in ascending order and so doesn’t generalize well.
-
PLEASE do not use iterrows
. I see some options suggesting iterating in reverse. Whatever your use case, there is likely a vectorized method available, but if there isn’t then you can use something a little more reasonable such as list comprehensions. See How to iterate over rows in a DataFrame in Pandas for more detail on why iterrows
is an antipattern.
df = df.loc[reversed(df.index)]
This is probably more explicit and less cryptic than negative slicing.
Here is my code:
import pandas as pd
data = pd.DataFrame({'Odd':[1,3,5,6,7,9], 'Even':[0,2,4,6,8,10]})
for i in reversed(data):
print(data['Odd'], data['Even'])
When I run this code, i get the following error:
Traceback (most recent call last):
File "C:Python33libsite-packagespandascoregeneric.py", line 665, in _get_item_cache
return cache[item]
KeyError: 5
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:Users*****Documents******************.py", line 5, in <module>
for i in reversed(data):
File "C:Python33libsite-packagespandascoreframe.py", line 2003, in __getitem__
return self._get_item_cache(key)
File "C:Python33libsite-packagespandascoregeneric.py", line 667, in _get_item_cache
values = self._data.get(item)
File "C:Python33libsite-packagespandascoreinternals.py", line 1656, in get
_, block = self._find_block(item)
File "C:Python33libsite-packagespandascoreinternals.py", line 1936, in _find_block
self._check_have(item)
File "C:Python33libsite-packagespandascoreinternals.py", line 1943, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: 'no item named 5'
Why am I getting this error?
How can I fix that?
What is the right way to reverse pandas.DataFrame
?
data.reindex(index=data.index[::-1])
or simply:
data.iloc[::-1]
will reverse your data frame, if you want to have a for
loop which goes from down to up you may do:
for idx in reversed(data.index):
print(idx, data.loc[idx, 'Even'], data.loc[idx, 'Odd'])
or
for idx in reversed(data.index):
print(idx, data.Even[idx], data.Odd[idx])
You are getting an error because reversed
first calls data.__len__()
which returns 6. Then it tries to call data[j - 1]
for j
in range(6, 0, -1)
, and the first call would be data[5]
; but in pandas dataframe data[5]
means column 5, and there is no column 5 so it will throw an exception. ( see docs )
You can reverse the rows in an even simpler way:
df[::-1]
This works:
for i,r in data[::-1].iterrows():
print(r['Odd'], r['Even'])
None of the existing answers resets the index after reversing the dataframe.
For this, do the following:
data[::-1].reset_index()
Here’s a utility function that also removes the old index column, as per @Tim’s comment:
def reset_my_index(df):
res = df[::-1].reset_index(drop=True)
return(res)
Simply pass your dataframe into the function
One way to do this if dealing with sorted range index is:
data = data.sort_index(ascending=False)
This approach has the benefits of (1) being a single line, (2) not requiring a utility function, and most importantly (3) not actually changing any of the data in the dataframe.
Caveat: this works by sorting the index in descending order and so may not always be appropriate or generalize for any given Dataframe.
What is the right way to reverse a pandas DataFrame?
TL;DR: df[::-1]
This is the best method for reversing a DataFrame, because 1) it is constant run time, i.e. O(1) 2) it’s a single operation, and 3) concise/readable (assuming familiarity with slice notation).
Long Version
I’ve found the ol’ slicing trick df[::-1]
(or the equivalent df.loc[::-1]
1) to be the most concise and idiomatic way of reversing a DataFrame. This mirrors the python list reversal syntax lst[::-1]
and is clear in its intent. With the loc
syntax, you are also able to slice columns if required, so it is a bit more flexible.
Some points to consider while handling the index:
-
"what if I want to reverse the index as well?"
- you’re already done.
df[::-1]
reverses both the index and values.
- you’re already done.
-
"what if I want to drop the index from the result?"
- you can call
.reset_index(drop=True)
at the end.
- you can call
-
"what if I want to keep the index untouched (IOW, only reverse the data, not the index)?"
- this is somewhat unconventional because it implies the index isn’t really relevant to the data. Perhaps consider removing it entirely? Although what you’re asking for can technically be achieved using either
df[:] = df[::-1]
which creates an in-place update todf
, ordf.loc[::-1].set_index(df.index)
, which returns a copy.
- this is somewhat unconventional because it implies the index isn’t really relevant to the data. Perhaps consider removing it entirely? Although what you’re asking for can technically be achieved using either
1: df.loc[::-1]
and df.iloc[::-1]
are equivalent since the slicing syntax remains the same, whether you’re reversing by position (iloc
) or label (loc
).
The Proof is in the Pudding
X-axis represents the dataset size. Y-axis represents time taken to reverse. No method scales as well as the slicing trick, it’s all the way at the bottom of the graph. Benchmarking code for reference, plots generated using perfplot.
Comments on other solutions
-
df.reindex(index=df.index[::-1])
is clearly a popular solution, but on first glance, how obvious is it to an unfamiliar reader that this code is "reversing a DataFrame"? Additionally, this is reversing the index, then using that intermediate result toreindex
, so this is essentially a TWO step operation (when it could’ve been just one). -
df.sort_index(ascending=False)
may work in most cases where you have a simple range index, but this assumes your index was sorted in ascending order and so doesn’t generalize well. -
PLEASE do not use
iterrows
. I see some options suggesting iterating in reverse. Whatever your use case, there is likely a vectorized method available, but if there isn’t then you can use something a little more reasonable such as list comprehensions. See How to iterate over rows in a DataFrame in Pandas for more detail on whyiterrows
is an antipattern.
df = df.loc[reversed(df.index)]
This is probably more explicit and less cryptic than negative slicing.