Python – Return DataFrame Objects if column value is an Element of given Array

Question:

I am extracting a HTML Table from Web with Pandas.
In this result (List of Dataframe Objects) I want to return all Dataframes where the Cell Value is an Element of an given Array.

So far I am struggling to call only one one column value and not the whole Object.

Syntax of Table: (the Header Lines are not extracted correctly so this i the real Output)

0 1 2 3
Date Name Number Text
09.09.2022 Smith Jason 3290 Free Car Wash
12.03.2022 Betty Paulsen 231 10l Gasoline
import pandas as pd
import numpy as np

url = f'https://some_website.com'

df = pd.read_html(url)

arr_Nr = ['3290', '9273']

def correct_number():
    for el in df[0][1]:
        if (el in arr_Nr):
            return True

def get_winner():
    for el in df:
        if (el in arr_Nr):
            return el

print(get_winner())

With the Function

correct_number()

I can output that there is a Winner, but not the Details, when I try to access

get_winner()

EDIT

So far I now think I got one step closer: The function read_html() returns a list of DataFrame Objects. In my example, there is only one table so accessing it via df = dfs[0] I should get the correct DataFrame Object.

But now when I try the following, the Code don’t work as expected, there is no Filter applied and the Table is returned in full:

df2 = df[df.Number == ‘3290’]
print(df2)

Asked By: senior_freshman

||

Answers:

Okay i finally figured it out:

Pandas returned List of DataFrame Objects, in my example there is only one table, to access this Table aka the DataFrame Object I had to access it first.
Before I then could compare the Values, I parsed them to integers, Pandas seemed to extract them as char, so my Array couldn’t compare them properly.

In the End the code looks way more elegant that I thought before:

import pandas as pd
import numpy as np

url = f'https://mywebsite.com/winners-2022'

dfs_list = pd.read_html(url,  header =0, flavor = 'bs4') 
df = dfs_list[0] 

winner_nrs = [3290, 843]

result = df[df.Losnummer.astype(int).isin(winner_nrs)]
Answered By: senior_freshman
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.