get elements from BeautifulSoup

Question

I need to get only x and y from this soup’s return, how can I do that, I tried using split() and del() in the string but it didn’t work. Here’s my code:

import urllib.request
from bs4 import BeautifulSoup

uri = 'https://www.comprasparaguai.com.br/notebook-apple-macbook-pro-2022-apple-m2-memoria-8gb-ssd-256gb-133_42393/'
page = urllib.request.urlopen(uri)
soup = BeautifulSoup(page, 'html.parser')

historico = soup.find(class_='chart-container')
print(historico)

My return:

['<div class="chart-container">n<canvas data-historico="[{'y': 1409.0', " 'x': '07/2022'}", " {'y': 1235.0", " 'x': '08/2022'}", " {'y': 1150.0", " 'x': '09/2022'}", " {'y': 1187.0", " 'x': '10/2022'}", " {'y': 1187.0", ' 'x': '10/2022'}]" id="grafico-modelo"></canvas>n</div>']

How can I get x and y so I can build a graph?

Asked By: zhyk

||

Source

Answer 1

To get the data as pandas DataFrame you can do:

import pandas as pd
from ast import literal_eval

data = literal_eval(historico.canvas["data-historico"])

# uncomment this to print all data
# print(data)

df = pd.DataFrame(data)
print(df)

Prints:

        y        x
0  1409.0  07/2022
1  1235.0  08/2022
2  1150.0  09/2022
3  1187.0  10/2022
4  1187.0  10/2022

Answered By: Andrej Kesely

Answer 2

You can get the element using a CSS selector, get the data-historico attribute, then parse the list with ast.literal_eval(). Like this:

from ast import literal_eval
import urllib.request
from bs4 import BeautifulSoup

uri = 'https://www.comprasparaguai.com.br/notebook-apple-macbook-pro-2022-apple-m2-memoria-8gb-ssd-256gb-133_42393/'
page = urllib.request.urlopen(uri)
soup = BeautifulSoup(page, 'html.parser')

historico = soup.select('.chart-container>canvas')
data = literal_eval(historico[0]['data-historico'])
print(data)

If you want to make a graph, then you’ll probably need a DataFrame. All you’d need to do in that case is use pandas.DataFrame(). For example:

import pandas

# ... same code as above ...

historico_df = pandas.DataFrame(data)
print(historico_df)

Answered By: Michael M.

get elements from BeautifulSoup

Question:

Answers: