Scraping data off Morningstar – Portfolio Screen


I am trying to scrape data from this link — basically all the data I can get, but particularly the Fixed Income Style Table and the Exposure, Bond Breakdown table.

Here is my code:

import requests
from selenium import webdriver
import pandas as pd
link = ''

headers = {
    'apikey': 'lstzFDEOhfFNMLikKa0am9mgEKLBl49T',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)               Chrome/100.0.4896.127 Safari/537.36'

payload = {
    'premiumNum': '1000',
    'freeNum': '1000',
    'languageId': 'en',
    'locale': 'en',
    'clientId': 'MDC',
    #'benchmarkId': 'mstarorcat',
    'benchmarkId': 'category',
    'component': 'sal-components-mip-holdings',
    'version': '3.59.1'

with requests.Session() as s:
    resp = s.get(link,params=payload)
    container = resp.json()

The above code is for what I have scraping the holdings data at the bottom. But it seems like I am having trouble figuring out what my 'component' field in my header should be. I have tried even 'sal-components-fixed-income-exposure-analysis' but to no avail.

Asked By: research51711



What you are doing is not web scraping, but an API request. There’s probably a way to get the data you want through the API but you might have to discover it from their docs:

But I can provide you with a code snippet for actually scraping the data from this page:

from selenium import webdriver
from import Service
from import ChromeDriverManager
from time import sleep
import pandas as pd

url = ''

options = webdriver.ChromeOptions()

with webdriver.Chrome(service=Service(ChromeDriverManager().install()),
                      options=options) as driver:
    html = driver.page_source

tables = pd.read_html(html) #this will require lxml module

"tables" here is a list of dataframes from every table found in the page when fully loaded.

To install lxml module just pip install lxml

Ps: I tried getting the html with a request response but it’s returning another page, looks like you gotta open the page and wait until it’s fully loaded to get the correct source html.

Answered By: Arthur Querido