Python – Return DataFrame Objects if column value is an Element of given Array
Question:
I am extracting a HTML Table from Web with Pandas.
In this result (List of Dataframe Objects) I want to return all Dataframes where the Cell Value is an Element of an given Array.
So far I am struggling to call only one one column value and not the whole Object.
Syntax of Table: (the Header Lines are not extracted correctly so this i the real Output)
0
1
2
3
Date
Name
Number
Text
09.09.2022
Smith Jason
3290
Free Car Wash
12.03.2022
Betty Paulsen
231
10l Gasoline
import pandas as pd
import numpy as np
url = f'https://some_website.com'
df = pd.read_html(url)
arr_Nr = ['3290', '9273']
def correct_number():
for el in df[0][1]:
if (el in arr_Nr):
return True
def get_winner():
for el in df:
if (el in arr_Nr):
return el
print(get_winner())
With the Function
correct_number()
I can output that there is a Winner, but not the Details, when I try to access
get_winner()
EDIT
So far I now think I got one step closer: The function read_html() returns a list of DataFrame Objects. In my example, there is only one table so accessing it via df = dfs[0]
I should get the correct DataFrame Object.
But now when I try the following, the Code don’t work as expected, there is no Filter applied and the Table is returned in full:
df2 = df[df.Number == ‘3290’]
print(df2)
Answers:
Okay i finally figured it out:
Pandas returned List of DataFrame Objects, in my example there is only one table, to access this Table aka the DataFrame Object I had to access it first.
Before I then could compare the Values, I parsed them to integers, Pandas seemed to extract them as char, so my Array couldn’t compare them properly.
In the End the code looks way more elegant that I thought before:
import pandas as pd
import numpy as np
url = f'https://mywebsite.com/winners-2022'
dfs_list = pd.read_html(url, header =0, flavor = 'bs4')
df = dfs_list[0]
winner_nrs = [3290, 843]
result = df[df.Losnummer.astype(int).isin(winner_nrs)]
I am extracting a HTML Table from Web with Pandas.
In this result (List of Dataframe Objects) I want to return all Dataframes where the Cell Value is an Element of an given Array.
So far I am struggling to call only one one column value and not the whole Object.
Syntax of Table: (the Header Lines are not extracted correctly so this i the real Output)
0 | 1 | 2 | 3 |
---|---|---|---|
Date | Name | Number | Text |
09.09.2022 | Smith Jason | 3290 | Free Car Wash |
12.03.2022 | Betty Paulsen | 231 | 10l Gasoline |
import pandas as pd
import numpy as np
url = f'https://some_website.com'
df = pd.read_html(url)
arr_Nr = ['3290', '9273']
def correct_number():
for el in df[0][1]:
if (el in arr_Nr):
return True
def get_winner():
for el in df:
if (el in arr_Nr):
return el
print(get_winner())
With the Function
correct_number()
I can output that there is a Winner, but not the Details, when I try to access
get_winner()
EDIT
So far I now think I got one step closer: The function read_html() returns a list of DataFrame Objects. In my example, there is only one table so accessing it via df = dfs[0]
I should get the correct DataFrame Object.
But now when I try the following, the Code don’t work as expected, there is no Filter applied and the Table is returned in full:
df2 = df[df.Number == ‘3290’]
print(df2)
Okay i finally figured it out:
Pandas returned List of DataFrame Objects, in my example there is only one table, to access this Table aka the DataFrame Object I had to access it first.
Before I then could compare the Values, I parsed them to integers, Pandas seemed to extract them as char, so my Array couldn’t compare them properly.
In the End the code looks way more elegant that I thought before:
import pandas as pd
import numpy as np
url = f'https://mywebsite.com/winners-2022'
dfs_list = pd.read_html(url, header =0, flavor = 'bs4')
df = dfs_list[0]
winner_nrs = [3290, 843]
result = df[df.Losnummer.astype(int).isin(winner_nrs)]