How to delete the current row in pandas dataframe during df.iterrows()
Question:
I would like to delete the current row during iteration – using df.iterrows()
, if it its certain column fails on my if
condition.
ex.
for index, row in df:
if row['A'] == 0:
#remove/drop this row from the df
del df[index] #I tried this but it gives me an error
This might be a very easy one, but i still can’t figure out how to do it.
Your help will be very much appreciated!
Answers:
I don’t know if this is pseudo code or not but you can’t delete a row like this, you can drop
it:
In [425]:
df = pd.DataFrame({'a':np.random.randn(5), 'b':np.random.randn(5)})
df
Out[425]:
a b
0 -1.348112 0.583603
1 0.174836 1.211774
2 -2.054173 0.148201
3 -0.589193 -0.369813
4 -1.156423 -0.967516
In [426]:
for index, row in df.iterrows():
if row['a'] > 0:
df.drop(index, inplace=True)
In [427]:
df
Out[427]:
a b
0 -1.348112 0.583603
2 -2.054173 0.148201
3 -0.589193 -0.369813
4 -1.156423 -0.967516
if you just want to filter those rows out you can perform boolean indexing:
df[df['a'] <=0]
would achieve the same thing
I tried @EdChum solution with a custom pandas.DataFrame
, but I did not get it working as an error was raising: KeyError: '[78] not found in axis'
. So on, if you got the same error, it can be fixed dropping the index of the dataframe on the specified index on each .iterrows() iteration.
The dataframe used was retrieved from investpy which contains all the equities/stock data indexed in Investing.com, and the print function is the one implemented in pprint. Anyways, this is the piece of code to get it working:
In [1]:
import investpy
from pprint import pprint
In [2]:
df = investpy.get_equities()
pprint(df.head())
Out [2]:
country name full_name
0 argentina Tenaris Tenaris
1 argentina PETROBRAS ON Petroleo Brasileiro - Petrobras
2 argentina GP Fin Galicia Grupo Financiero Galicia B
3 argentina Ternium Argentina Ternium Argentina Sociedad Anónima
4 argentina Pampa Energía Pampa Energía S.A.
tag isin id currency
0 tenaris?cid=13302 LU0156801721 13302 ARS
1 petrobras-on?cid=13303 BRPETRACNOR9 13303 ARS
2 gp-fin-galicia ARP495251018 13304 ARS
3 siderar ARSIDE010029 13305 ARS
4 pampa-energia ARP432631215 13306 ARS
In [3]:
pprint(df[df['tag'] == 'koninklijke-philips-electronics'])
Out [3]:
country name full_name
78 argentina Koninklijke Philips DRC Koninklijke Philips NV DRC
tag isin id currency
78 koninklijke-philips-electronics ARDEUT110558 30044 ARS
In [4]:
for index, row in df.iterrows():
if row['tag'] == 'koninklijke-philips-electronics':
df.drop(df.index[index], inplace=True)
In [5]:
pprint(df[df['tag'] == 'koninklijke-philips-electronics'])
Out [5]:
Empty DataFrame
Columns: [country, name, full_name, tag, isin, id, currency]
Index: []
Hope this helped someone! Also thank you anyways for the original answer @EdChum!
I would like to delete the current row during iteration – using df.iterrows()
, if it its certain column fails on my if
condition.
ex.
for index, row in df:
if row['A'] == 0:
#remove/drop this row from the df
del df[index] #I tried this but it gives me an error
This might be a very easy one, but i still can’t figure out how to do it.
Your help will be very much appreciated!
I don’t know if this is pseudo code or not but you can’t delete a row like this, you can drop
it:
In [425]:
df = pd.DataFrame({'a':np.random.randn(5), 'b':np.random.randn(5)})
df
Out[425]:
a b
0 -1.348112 0.583603
1 0.174836 1.211774
2 -2.054173 0.148201
3 -0.589193 -0.369813
4 -1.156423 -0.967516
In [426]:
for index, row in df.iterrows():
if row['a'] > 0:
df.drop(index, inplace=True)
In [427]:
df
Out[427]:
a b
0 -1.348112 0.583603
2 -2.054173 0.148201
3 -0.589193 -0.369813
4 -1.156423 -0.967516
if you just want to filter those rows out you can perform boolean indexing:
df[df['a'] <=0]
would achieve the same thing
I tried @EdChum solution with a custom pandas.DataFrame
, but I did not get it working as an error was raising: KeyError: '[78] not found in axis'
. So on, if you got the same error, it can be fixed dropping the index of the dataframe on the specified index on each .iterrows() iteration.
The dataframe used was retrieved from investpy which contains all the equities/stock data indexed in Investing.com, and the print function is the one implemented in pprint. Anyways, this is the piece of code to get it working:
In [1]:
import investpy
from pprint import pprint
In [2]:
df = investpy.get_equities()
pprint(df.head())
Out [2]:
country name full_name
0 argentina Tenaris Tenaris
1 argentina PETROBRAS ON Petroleo Brasileiro - Petrobras
2 argentina GP Fin Galicia Grupo Financiero Galicia B
3 argentina Ternium Argentina Ternium Argentina Sociedad Anónima
4 argentina Pampa Energía Pampa Energía S.A.
tag isin id currency
0 tenaris?cid=13302 LU0156801721 13302 ARS
1 petrobras-on?cid=13303 BRPETRACNOR9 13303 ARS
2 gp-fin-galicia ARP495251018 13304 ARS
3 siderar ARSIDE010029 13305 ARS
4 pampa-energia ARP432631215 13306 ARS
In [3]:
pprint(df[df['tag'] == 'koninklijke-philips-electronics'])
Out [3]:
country name full_name
78 argentina Koninklijke Philips DRC Koninklijke Philips NV DRC
tag isin id currency
78 koninklijke-philips-electronics ARDEUT110558 30044 ARS
In [4]:
for index, row in df.iterrows():
if row['tag'] == 'koninklijke-philips-electronics':
df.drop(df.index[index], inplace=True)
In [5]:
pprint(df[df['tag'] == 'koninklijke-philips-electronics'])
Out [5]:
Empty DataFrame
Columns: [country, name, full_name, tag, isin, id, currency]
Index: []
Hope this helped someone! Also thank you anyways for the original answer @EdChum!