HTTP Error 403: Forbidden when reading HTML

Question:

I would like to read the following html,

 import pandas as pd

daily_info=pd.read_html('https://www.investing.com/earnings-calendar/',flavor='html5lib')

print(daily_info)

Unfortunatelly appears :

urllib.error.HTTPError: HTTP Error 403: Forbidden

Is there anyway to fix it?

Asked By: JamesHudson81

||

Answers:

Pretend to be a browser:

import requests

url = 'https://www.investing.com/earnings-calendar/'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

r = requests.get(url, headers=header)

dfs = pd.read_html(r.text)

Result:

In [201]: len(dfs)
Out[201]: 7

In [202]: dfs[0]
Out[202]:
    0   1   2   3
0 NaN NaN NaN NaN

In [203]: dfs[1]
Out[203]:
                 Unnamed: 0                                      Company    EPS /  Forecast Revenue /  Forecast.1 Market Cap  Time  
0    Monday, April 24, 2017                                          NaN    NaN         NaN     NaN           NaN        NaN   NaN
1                       NaN                                 Acadia (AKR)     --      / 0.11      --          / --      2.63B   NaN
2                       NaN                                  Agree (ADC)     --      / 0.39      --          / --      1.34B   NaN
3                       NaN                                   Alcoa (AA)     --      / 0.53      --          / --      5.84B   NaN
4                       NaN                        American Campus (ACC)     --      / 0.27      --          / --      6.62B   NaN
5                       NaN                   Ameriprise Financial (AMP)     --      / 2.52      --          / --     19.76B   NaN
6                       NaN                          Avacta Group (AVTG)     --        / --   1.26M          / --     47.53M   NaN
7                       NaN                         Bank of Hawaii (BOH)    1.2      / 1.08  165.8M          / --      3.48B   NaN
8                       NaN                         Bank of Marin (BMRC)   0.74       / 0.8      --          / --    422.29M   NaN
9                       NaN                                Banner (BANR)     --      / 0.68      --          / --      1.82B   NaN
10                      NaN                           Barrick Gold (ABX)     --       / 0.2      --          / --     22.44B   NaN
11                      NaN                           Barrick Gold (ABX)     --      / 0.28      --          / --     30.28B   NaN
12                      NaN               Berkshire Hills Bancorp (BHLB)     --      / 0.54      --          / --      1.25B   NaN
13                      NaN   Brookfield Canada Office Properties (BOXC)     --        / --      --          / --        NaN   NaN

...
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.