Download/Get Data from .ods file on a website (using Python)

Question:

I have a Website with the url = https://www.statistik.at/statistiken/volkswirtschaft-und-oeffentliche-finanzen/preise-und-preisindizes/verbraucherpreisindex-vpi/hvpi

On this website there is a link with url = "https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fwww.statistik.at%2Ffileadmin%2Fpages%2F214%2F2_Verbraucherpreisindizes_ab_1990.ods"

This opens a ‘.ods’ in my browser if I click it.
What I would like to do is download the file and work with the contents.

I tried for example:

import wget
import pandas as pd
import requests
from pandas_ods_reader import read_ods

    with open('F2_Verbraucherpreisindizes_ab_1990.ods', 'wb') as out_file:
   content = requests.get(url, stream=True).content
   out_file.write(content)

file_url = url
file_data = requests.get(file_url).content
with open("F2_Verbraucherpreisindizes_ab_1990.ods", "wb") as file:
    file.write(file_data)
df = read_ods('F2_Verbraucherpreisindizes_ab_1990.ods', 1)

Here I get the Error:

...
KeyError: '.ods'

I also tried different approaches using the wget, requests and Bs4.

The main problem is that I always seem to download an HTML document, namely:

<!DOCTYPE html><html><head><meta http-equiv="X-UA-Compatible" content="IE=Edge" /><meta name="robots" content="noindex" /><style type="text/css"> body { margin:0; overflow:hidden; background-color:#fff; background-repeat:no-repeat;} #wacframe { width:100%; height:100%; position:absolute; top:0; left:0; } </style><![if gte IE 8]><style type="text/css"> .load_center img{margin:5px;} #load_img{width:100%;height:100%;position:absolute;text-align:center;} #load_img img{position:relative;} .load_center{position:absolute;left:0;right:0;bottom:50%;} .load_header { font-family: calibri, tahoma, verdana, arial, sans serif; font-size: 18pt; color: #444444; line-height: 150% } .load_text { font-family: calibri, tahoma, verdana, arial, sans serif; font-size: 10pt; color: #444444; } </style><![endif]></head><body width="100%" height="100%" onload="OnLoad()" ><![if gte IE 8]><div id="load_img"><div class="load_center"><div class="load_header">We&#39;re fetching your file...</div><div class="load_text">Please wait a moment while we retrieve your file from its home on the internet</div><img align="absmiddle" src=""/></div></div><![endif]><form method="post" action="./view.aspx?src=https%3a%2f%2fwww.statistik.at%2ffileadmin%2fpages%2f214%2f2_Verbraucherpreisindizes_ab_1990.ods" id="form1">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="" />
<script type="text/javascript"> var _iframeUrl = 'https:u002fu002fPNL1-excel.officeapps.live.comu002fxu002f_layoutsu002fxlviewerinternal.aspx?ui=enu00252DUSu0026rs=enu00252DUSu0026WOPISrc=httpu00253Au00252Fu00252Fpnl1u00252Dviewu00252Dwopiu00252Ewopiu00252Eonlineu00252Eofficeu00252Enetu00253A808u00252Fohu00252Fwopiu00252Ffilesu00252Fu002540u00252FwFileIdu00253FwFileIdu00253Dhttpsu0025253Au0025252Fu0025252Fwwwu0025252Estatistiku0025252Eatu0025253A443u0025252Ffileadminu0025252Fpagesu0025252F214u0025252F2u0025255FVerbraucherpreisindizesu0025255Fabu0025255F1990u0025252Eodsu0026access_token_ttl=0u0026hid=e87397ba-31d7-4354-a862-b6d7bf1939a3'; var _windowTitle = '2_Verbraucherpreisindizes_ab_1990.ods'; var _favIconUrl = 'https://c1-view-15.cdn.office.net:443/op/s/161562341022_Resources/FavIcon_Excel.ico'; var _shouldDoRedirect = false; var _failureRedirectUrl = ''; var _accessToken = '1'; function OnLoad() { if (_shouldDoRedirect) { window.location = _failureRedirectUrl; return; } document.title = _windowTitle; var link = document.createElement("link"); link.type = "image/vnd.microsoft.icon"; link.rel = "shortcut icon"; link.href = _favIconUrl; document.getElementsByTagName('head')[0].appendChild(link); var img = document.getElementById('load_img'); if (img) img.style.display = 'none'; var iframe = document.createElement('iframe'); iframe.src = ''; iframe.frameBorder = 0; iframe.id = 'wacframe'; iframe.name = 'wacframe'; iframe.title = 'Office on the web Frame'; iframe.setAttribute('allowfullscreen', 'true'); document.body.appendChild(iframe); var form2 = document.createElement('form'); form2.action = _iframeUrl; form2.method = 'post'; form2.target = 'wacframe'; form2.id = 'form2'; var input = document.createElement('input'); input.type = 'hidden'; input.name = 'access_token'; input.value = _accessToken; form2.appendChild(input); document.body.appendChild(form2); form2.submit(); } </script></form></body></html> 

But I am not able to get the .ods file.
Extra Info: The .ods file is an Excel Document with is read only.

Thanks in advance for any help/tips!

Asked By: Paul Strasser

||

Answers:

You need to install odfpy:

pip install odfpy

Then you can use pandas read_excel function, to read this file:

import pandas as pd
import requests

url = 'https://www.statistik.at/fileadmin/pages/214/2_Verbraucherpreisindizes_ab_1990.ods'
file_name = 'F2_Verbraucherpreisindizes_ab_1990.ods'
with open(file_name, 'wb') as out_file:
    content = requests.get(url, stream=True).content
    out_file.write(content)
df = pd.read_excel(file_name)

OUTPUT:
enter image description here

Answered By: Sergey K

Here’s another way using stream-read-ods (full disclosure: written by me)

import httpx
import pandas as pd
from stream_read_ods import stream_read_ods, simple_table

url = 'https://www.statistik.at/fileadmin/pages/214/2_Verbraucherpreisindizes_ab_1990.ods'

def get():
    with httpx.stream('GET', url) as r:
        yield from r.iter_bytes()

for sheet_name, sheet_rows in stream_read_ods(get()):
    columns, rows = simple_table(sheet_rows, skip_rows=1)
    df = pd.DataFrame(rows, columns=columns)
    print(df)
Answered By: Michal Charemza
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.