Python basics – request data from API and write to a file

Question:

I am trying to use “requests” package and retrieve info from Github, like the Requests doc page explains:

import requests
r = requests.get('https://api.github.com/events')

And this:

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
        fd.write(chunk)

I have to say I don’t understand the second code block.

  • filename – in what form do I provide the path to the file if created? where will it be saved if not?
  • ‘wb’ – what is this variable? (shouldn’t second parameter be ‘mode’?)
  • following two lines probably iterate over data retrieved with request and write to the file

Python docs explanation also not helping much.

EDIT: What I am trying to do:

  • use Requests to connect to an API (Github and later Facebook GraphAPI)
  • retrieve data into a variable
  • write this into a file (later, as I get more familiar with Python, into my local MySQL database)
Asked By: Alexander Starbuck

||

Answers:

filename is a string of the path you want to save it at. It accepts either local or absolute path, so you can just have filename = 'example.html'

wb stands for WRITE & BYTES, learn more here

The for loop goes over the entire returned content (in chunks incase it is too large for proper memory handling), and then writes them until there are no more. Useful for large files, but for a single webpage you could just do:

# just W becase we are not writing as bytes anymore, just text.
with open(filename, 'w') as fd: 
    fd.write(r.content)
Answered By: CasualDemon

Filename

When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.

Mode

The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.

The Loop

This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.

Conclusion

The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:

import requests
r = requests.get('https://api.github.com/events')

with open('events.txt','w') as fd:
    fd.write(r.text)
Answered By: TheSchwa

I have a response in JSON format. I would like to write as JSON file.

with open('/dbfs/tmp/response.json','w') as fd:
    fd.write(r.text)

Then, I want to read this data into a dataframe. It is reading as corrupt record.

How do I read into a data frame nicely?
df = spark.read.format(‘org.apache.spark.sql.json’).load("/tmp/response.json")

See screenshot

Answered By: Opravin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.