Creating a new dataframe where a field is blank in the original dataframe

Question

Using Python3 and Pandas. I am admittedly pretty new and I’m having a hard time searching for an answer to this question.

I have a dataframe that contains lots of information and I’m trying to get a dataframe that is just the items where one specific field in the original is blank.

I have queried my database to get a dataframe I am calling full_df which is all information on all items in the database. I want to now create a new dataframe that selects just the items where one field in full_df is blank.

This is what I’ve tried:

no_rate = full_df[(full_df['rate'] == "")]

Which is returning nothing even though I know for a fact that there are loads of items where ‘rate’ is blank. I expected the dataframe no_rate to be populated with all the items where ‘rate’ is blank.

How do I select those items for this new dataframe?

Asked By: heinzdoof

||

Source

Answer 1

There are a few things you need to do. First of all, is the data type of your rate column a string, or object? df.dtypes will tell you. If not, then you can’t test it against "".

Second, and more to the point, a way to do a conditional select is by useing loc.

So, if your rate column looks like this

df = pd.DataFrame({'Rate': ['good', 'good', 'bad', 'medium', '', 'bad', '', 'good']})
df

    Rate
0   good
1   good
2   bad
3   medium
4   
5   bad
6   
7   good

then you could write

df.loc[df['Rate']==""]

and get

    Rate
4   
6

which is actually showing you the contents, but since there is nothing in there, it looks like just the row numbers. Let’s add another column to see the results more plainly.

df['Color'] = ['Red', 'Blue', 'Yellow', 'Red', 'Yellow', 'Red', 'Green', 'Blue']
df
    Rate    Color
0   good    Red
1   good    Blue
2   bad Yellow
3   medium  Red
4       Yellow
5   bad Red
6       Green
7   good    Blue

and

df.loc[df['Rate'] == ""]

shows

    Rate    Color
4       Yellow
6       Green

So, what if your rate is actually a number

df['Decimal_Rate'] = [.8, .8, .3, .6, np.nan, .2, np.nan, .9]
df
    Rate    Color   Decimal_Rate
0   good    Red 0.8
1   good    Blue    0.8
2   bad Yellow  0.3
3   medium  Red 0.6
4       Yellow  
5   bad Red 0.2
6       Green   
7   good    Blue    0.9

if you wanted to isolate the empty cells of numbers, you can go like this:

df.loc[df['Decimal_Rate'].isna()]

which results in

    Rate    Color   Decimal_Rate
4       Yellow  
6       Green

Answered By: Nesha25

Creating a new dataframe where a field is blank in the original dataframe

Question:

Answers: