Python, Pandas : Return only those rows which have missing values
Question:
While working in Pandas in Python…
I’m working with a dataset that contains some missing values, and I’d like to return a dataframe which contains only those rows which have missing data. Is there a nice way to do this?
(My current method to do this is an inefficient “look to see what index isn’t in the dataframe without the missing values, then make a df out of those indices.”)
Answers:
You can use any
axis=1
to check for least one True
per row, then filter with boolean indexing:
null_data = df[df.isnull().any(axis=1)]
If you are looking for a quicker way to find the total number of missing rows in the dataframe, you can use this:
sum(df.isnull().values.any(axis=1))
df.isnull().any(axis = 1).sum()
this gives you the total number of rows with at least one missing data
I just had this problem I assume you want to view a section of data frame made up of rows with missing values I used
df.loc[df.isnull().any(axis=1)]
If you want to see only the rows that contains the NaN values you could do:
data_frame[data_frame.iloc[:, insert column number here]=='NaN']
You Can Use the code in this way
sum(df.isnull().any(axis=1))
While working in Pandas in Python…
I’m working with a dataset that contains some missing values, and I’d like to return a dataframe which contains only those rows which have missing data. Is there a nice way to do this?
(My current method to do this is an inefficient “look to see what index isn’t in the dataframe without the missing values, then make a df out of those indices.”)
You can use any
axis=1
to check for least one True
per row, then filter with boolean indexing:
null_data = df[df.isnull().any(axis=1)]
If you are looking for a quicker way to find the total number of missing rows in the dataframe, you can use this:
sum(df.isnull().values.any(axis=1))
df.isnull().any(axis = 1).sum()
this gives you the total number of rows with at least one missing data
I just had this problem I assume you want to view a section of data frame made up of rows with missing values I used
df.loc[df.isnull().any(axis=1)]
If you want to see only the rows that contains the NaN values you could do:
data_frame[data_frame.iloc[:, insert column number here]=='NaN']
You Can Use the code in this way
sum(df.isnull().any(axis=1))