'application/octet-stream' instead of application/csv?

Question:

I am quite new to Python. I want to confirm that the type of the dataset (URL in the code below) is indeed a csv file. However, when checking via the headers I get ‘application/octet-stream’ instead of ‘application/csv’.

I assume that I defined something in the wrong way when reading in the data, but I don’t know what.

import requests
url="https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
d1 = requests.get( url )

filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, "wb") as f: 
    f.write(d1.content)

## data type via headers #PROBLEM
import requests
headerDict=d1.headers

#accessing content-type header
if "Content-Type" in headerDict:
    print("Content-Type:")
    print( headerDict['Content-Type'] )
Asked By: Katharina Böhm

||

Answers:

I assume that I defined something in the wrong way when reading in the data

No, you didn’t. The Content-Type header is supposed to indicate what the response body is, but there is nothing you can do to force the server to set that to a value you expect. Some servers are just badly configured and don’t play along.

application/octet-stream is the most generic content type of them all – it gives you no more info than "it’s a bunch of bytes, have fun".

What’s more, there isn’t necessarily One True Type for each kind of content, only more-or-less widely agreed-upon conventions. For CSV, a common one would be text/csv.

So if you’re sure what the content is, feel free to ignore the Content-Type header.

import requests

url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
response = requests.get(url)

filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, "wb") as f: 
    f.write(response.content)

Writing to file in binary mode is a good idea in the absence of any further information, because this will retain the original bytes exactly as they were.


In order to convert that to string, it needs to be decoded using a certain encoding. Since the Content-Type did not give any indication here (it could have said Content-Type: text/csv; charset=XYZ), the best first assumption for data from the Internet would be UTF-8:

import csv

filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, encoding='utf-8') as f: 
    reader = csv.reader(f, delimiter=',')
    for row in reader:
        print(row)

Should that turn out to be wrong (i.e. there are decoding errors or garbled characters), you can try a different encoding until you find one that works. That would not be possible if you had written the file in text mode in the beginning, as any data corruption from wrong decoding would have made it into the file.

Answered By: Tomalak
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.