get elements from BeautifulSoup
Question:
I need to get only x
and y
from this soup’s return, how can I do that, I tried using split()
and del()
in the string but it didn’t work. Here’s my code:
import urllib.request
from bs4 import BeautifulSoup
uri = 'https://www.comprasparaguai.com.br/notebook-apple-macbook-pro-2022-apple-m2-memoria-8gb-ssd-256gb-133_42393/'
page = urllib.request.urlopen(uri)
soup = BeautifulSoup(page, 'html.parser')
historico = soup.find(class_='chart-container')
print(historico)
My return:
['<div class="chart-container">n<canvas data-historico="[{'y': 1409.0', " 'x': '07/2022'}", " {'y': 1235.0", " 'x': '08/2022'}", " {'y': 1150.0", " 'x': '09/2022'}", " {'y': 1187.0", " 'x': '10/2022'}", " {'y': 1187.0", ' 'x': '10/2022'}]" id="grafico-modelo"></canvas>n</div>']
How can I get x
and y
so I can build a graph?
Answers:
To get the data as pandas DataFrame you can do:
import pandas as pd
from ast import literal_eval
data = literal_eval(historico.canvas["data-historico"])
# uncomment this to print all data
# print(data)
df = pd.DataFrame(data)
print(df)
Prints:
y x
0 1409.0 07/2022
1 1235.0 08/2022
2 1150.0 09/2022
3 1187.0 10/2022
4 1187.0 10/2022
You can get the element using a CSS selector, get the data-historico
attribute, then parse the list with ast.literal_eval()
. Like this:
from ast import literal_eval
import urllib.request
from bs4 import BeautifulSoup
uri = 'https://www.comprasparaguai.com.br/notebook-apple-macbook-pro-2022-apple-m2-memoria-8gb-ssd-256gb-133_42393/'
page = urllib.request.urlopen(uri)
soup = BeautifulSoup(page, 'html.parser')
historico = soup.select('.chart-container>canvas')
data = literal_eval(historico[0]['data-historico'])
print(data)
If you want to make a graph, then you’ll probably need a DataFrame. All you’d need to do in that case is use pandas.DataFrame()
. For example:
import pandas
# ... same code as above ...
historico_df = pandas.DataFrame(data)
print(historico_df)
I need to get only x
and y
from this soup’s return, how can I do that, I tried using split()
and del()
in the string but it didn’t work. Here’s my code:
import urllib.request
from bs4 import BeautifulSoup
uri = 'https://www.comprasparaguai.com.br/notebook-apple-macbook-pro-2022-apple-m2-memoria-8gb-ssd-256gb-133_42393/'
page = urllib.request.urlopen(uri)
soup = BeautifulSoup(page, 'html.parser')
historico = soup.find(class_='chart-container')
print(historico)
My return:
['<div class="chart-container">n<canvas data-historico="[{'y': 1409.0', " 'x': '07/2022'}", " {'y': 1235.0", " 'x': '08/2022'}", " {'y': 1150.0", " 'x': '09/2022'}", " {'y': 1187.0", " 'x': '10/2022'}", " {'y': 1187.0", ' 'x': '10/2022'}]" id="grafico-modelo"></canvas>n</div>']
How can I get x
and y
so I can build a graph?
To get the data as pandas DataFrame you can do:
import pandas as pd
from ast import literal_eval
data = literal_eval(historico.canvas["data-historico"])
# uncomment this to print all data
# print(data)
df = pd.DataFrame(data)
print(df)
Prints:
y x
0 1409.0 07/2022
1 1235.0 08/2022
2 1150.0 09/2022
3 1187.0 10/2022
4 1187.0 10/2022
You can get the element using a CSS selector, get the data-historico
attribute, then parse the list with ast.literal_eval()
. Like this:
from ast import literal_eval
import urllib.request
from bs4 import BeautifulSoup
uri = 'https://www.comprasparaguai.com.br/notebook-apple-macbook-pro-2022-apple-m2-memoria-8gb-ssd-256gb-133_42393/'
page = urllib.request.urlopen(uri)
soup = BeautifulSoup(page, 'html.parser')
historico = soup.select('.chart-container>canvas')
data = literal_eval(historico[0]['data-historico'])
print(data)
If you want to make a graph, then you’ll probably need a DataFrame. All you’d need to do in that case is use pandas.DataFrame()
. For example:
import pandas
# ... same code as above ...
historico_df = pandas.DataFrame(data)
print(historico_df)