Missing rows in dataframe

Question

I am trying to create a data frame that is a subset of the original based on specific results in a column but it keeps excluding some of the data, specifically codes 59960, 59961, 59962.

I have also confirmed that the column includes the identifier I am parsing for using .unique()

Here is my code:

new_df = original_df[(original_df["Course Offering Code"] == 19191)|
(original_df["Course Offering Code"] == 2201.20215)|
(original_df["Course Offering Code"] == 2387.2205)|
(original_df["Course Offering Code"] == 2388.20225)|
(original_df["Course Offering Code"] == 59960.20211)|
(original_df["Course Offering Code"] == 59961.20211)|
(original_df["Course Offering Code"] == 59962.20211)|
(original_df["Course Offering Code"] == 61199.20211)|
(original_df["Course Offering Code"] == 61201.20211)|
(original_df["Course Offering Code"] == 61202.20211)]

thank you!

Asked By: Kevin

||

Source

Answer 1

Try it like this instead…

codes = [19191, 2201, 2387, 59960, 59961, 59962, 61199, 61201, 61202]
new_df = original_df[original_df['Course Offering Code'].isin(codes)]

Answered By: BeRT2me

Answer 2

It is due to float comparisons that are not precise in pandas.

You will have to either round it or use close comparisons. Having said that, it looks like Course offering codes are just codes and might not need to be float64 – because technically a code can be represented by any unique number. Therefore, you can instead change the Course Offering Code column to str and select them instead, where you wont land into these problems.

Answered By: the_ordinary_guy

Missing rows in dataframe

Question:

Answers: