Pandas deleting row with df.drop doesn't work
Question:
I have a DataFrame like this (first column is index
(786…) and second day
(25…) and Rainfall amount
is empty):
Day Rainfall amount (millimetres)
786 25
787 26
788 27
789 28
790 29
791 1
792 2
793 3
794 4
795 5
and I want to delete the row 790. I tried so many things with df.drop but nothin happend.
I hope you can help me.
Answers:
While dropping new DataFrame returns. If you want to apply changes to the current DataFrame you have to specify inplace
parameter.
Option 1
Assigning back to df
–
df = df.drop(790)
Option 2
Inplace argument –
df.drop(790, inplace=True)
As others may be in my shoes, I’ll add a bit here. I’ve merged three CSV files of data and they mistakenly have the headers copied into the dataframe. Now, naturally, I assumed pandas would have an easy method to remove these obviously bad rows. However, it’s not working and I’m still a bit perplexed with this. After using df.drop() I see that the length of my dataframe correctly decreases by 2 (I have two bad rows of headers). But the values are still there and attempts to make a histogram will throw errors due to empty values. Here’s the code:
df1=pd.read_csv('./summedDF_combined.csv',index_col=[0])
print len(df1['x'])
badRows=pd.isnull(pd.to_numeric(df1['y'], errors='coerce')).nonzero()[0]
print "Bad rows:",badRows
df1.drop(badRows, inplace=True)
print len(df1['x'])
I’ve tried other functions in tandem with no luck. This shows an empty list for badrows but still will not plot due to the bad rows still being in the df, just deindexed:
print len(df1['x'])
df1=df1.dropna().reset_index(drop=True)
df1=df1.dropna(axis=0).reset_index(drop=True)
badRows=pd.isnull(pd.to_numeric(df1['x'], errors='coerce')).nonzero()[0]
print "Bad rows:",badRows
I’m stumped, but have one solution that works for the subset of folks who merged CSV files and got stuck. Go back to your original files and merge again, but take care to exclude the headers like so:
head -n 1 anyOneFile.csv > summedDFs.csv && tail -n+2 -q summedBlipDF2*.csv >> summedDFs.out
Apologies, I know this isn’t the pythonic or pandas way to fix it and I hope the mods don’t feel the need to remove it as it works for the small subset of people with my problem.
I have a DataFrame like this (first column is index
(786…) and second day
(25…) and Rainfall amount
is empty):
Day Rainfall amount (millimetres)
786 25
787 26
788 27
789 28
790 29
791 1
792 2
793 3
794 4
795 5
and I want to delete the row 790. I tried so many things with df.drop but nothin happend.
I hope you can help me.
While dropping new DataFrame returns. If you want to apply changes to the current DataFrame you have to specify inplace
parameter.
Option 1
Assigning back to df
–
df = df.drop(790)
Option 2
Inplace argument –
df.drop(790, inplace=True)
As others may be in my shoes, I’ll add a bit here. I’ve merged three CSV files of data and they mistakenly have the headers copied into the dataframe. Now, naturally, I assumed pandas would have an easy method to remove these obviously bad rows. However, it’s not working and I’m still a bit perplexed with this. After using df.drop() I see that the length of my dataframe correctly decreases by 2 (I have two bad rows of headers). But the values are still there and attempts to make a histogram will throw errors due to empty values. Here’s the code:
df1=pd.read_csv('./summedDF_combined.csv',index_col=[0])
print len(df1['x'])
badRows=pd.isnull(pd.to_numeric(df1['y'], errors='coerce')).nonzero()[0]
print "Bad rows:",badRows
df1.drop(badRows, inplace=True)
print len(df1['x'])
I’ve tried other functions in tandem with no luck. This shows an empty list for badrows but still will not plot due to the bad rows still being in the df, just deindexed:
print len(df1['x'])
df1=df1.dropna().reset_index(drop=True)
df1=df1.dropna(axis=0).reset_index(drop=True)
badRows=pd.isnull(pd.to_numeric(df1['x'], errors='coerce')).nonzero()[0]
print "Bad rows:",badRows
I’m stumped, but have one solution that works for the subset of folks who merged CSV files and got stuck. Go back to your original files and merge again, but take care to exclude the headers like so:
head -n 1 anyOneFile.csv > summedDFs.csv && tail -n+2 -q summedBlipDF2*.csv >> summedDFs.out
Apologies, I know this isn’t the pythonic or pandas way to fix it and I hope the mods don’t feel the need to remove it as it works for the small subset of people with my problem.