How to launch a conditinated selection of dataframe's row with mixed values

Question:

I am trying to use the conditioned selection of interested rows/columns into the followng dataset:

import pandas as pd

already_read = [("Il nome della rosa","Umberto Eco", 1980), 
        ("L'amore che ti meriti","Daria Bignardi", 2014), 
        ("Memorie dal sottsuolo", " Fëdor Dostoevskij", 1864), 
        ("Oblomov", "Ivan Alexandrovich Goncharov ", '/')]

index = range(1,5,1)
data = pd.DataFrame(already_read, columns = ["Books'Title", "Authors", "Publishing Year"], index = index)
data

In the following way:

data[(data['Publishing Year'] >= 1850) & (data['Publishing Year'] <= 1950)]

As you could see, the column I have chosen contains mixed data (int and str) and indeed I have this error after running the code:

TypeError: '>=' not supported between instances of 'str' and 'int'

If please, since I’m moving my very first step with Python, could you please suggest some way to run that code in a way that the string value is excluded or it is read as an integer, possibly by implementing *if statement?* (or another method)?

Thanks

Asked By: creativity

||

Answers:

One way to go, would be to use df.apply with a custom function. Something like this:

def check_int(x):
    if isinstance(x, int):
        return (x >= 1850) & (x <= 1950)
    return False

data[data['Publishing Year'].apply(lambda x: check_int(x))]

Here check_int will return False for every value that is not an int, and apply the evaluation just on the ints. So, we are getting:

data['Publishing Year'].apply(lambda x: check_int(x))

1    False
2    False
3     True
4    False
Name: Publishing Year, dtype: bool

And next we use this pd.Series with booleans to select from the data:

data[data['Publishing Year'].apply(lambda x: check_int(x))]

             Books'Title             Authors Publishing Year
3  Memorie dal sottsuolo   Fëdor Dostoevskij            1864

Answered By: ouroboros1