How to download and write a file from Github using Requests

Question:

Lets say there’s a file that lives at the github repo:

https://github.com/someguy/brilliant/blob/master/somefile.txt

I’m trying to use requests to request this file, write the content of it to disk in the current working directory where it can be used later. Right now, I’m using the following code:

import requests
from os import getcwd

url = "https://github.com/someguy/brilliant/blob/master/somefile.txt"
directory = getcwd()
filename = directory + 'somefile.txt'
r = requests.get(url)

f = open(filename,'w')
f.write(r.content)

Undoubtedly ugly, and more importantly, not working. Instead of the expected text, I get:

<!DOCTYPE html>
<!--

Hello future GitHubber! I bet you're here to remove those nasty inline styles,
DRY up these templates and make 'em nice and re-usable, right?

Please, don't. https://github.com/styleguide/templates/2.0

-->
<html>
  <head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8">
    <title>Page not found &middot; GitHub</title>
    <style type="text/css" media="screen">
      body {
        background: #f1f1f1;
        font-family: "HelveticaNeue", Helvetica, Arial, sans-serif;
        text-rendering: optimizeLegibility;
        margin: 0; }

      .container { margin: 50px auto 40px auto; width: 600px; text-align: center; }

      a { color: #4183c4; text-decoration: none; }
      a:visited { color: #4183c4 }
      a:hover { text-decoration: none; }

      h1 { letter-spacing: -1px; line-height: 60px; font-size: 60px; font-weight: 100; margin: 0px; text-shadow: 0 1px 0 #fff; }
      p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

      ul { list-style: none; margin: 25px 0; padding: 0; }
      li { display: table-cell; font-weight: bold; width: 1%; }
      #error-suggestions { font-size: 14px; }
      #next-steps { margin: 25px 0 50px 0;}
      #next-steps li { display: block; width: 100%; text-align: center; padding: 5px 0; font-weight: normal; color: rgba(0, 0, 0, 0.5); }
      #next-steps a { font-weight: bold; }
      .divider { border-top: 1px solid #d5d5d5; border-bottom: 1px solid #fafafa;}

      #parallax_wrapper {
        position: relative;
        z-index: 0;
      }
      #parallax_field {
        overflow: hidden;
        position: absolute;
        left: 0;
        top: 0;
        height: 370px;
        width: 100%;
      }

etc etc.

Content from Github, but not the content of the file. What am I doing wrong?

Asked By: Fomite

||

Answers:

The content of the file in question is included in the returned data. You are getting the full GitHub view of that file, not just the contents.

If you want to download just the file, you need to use the Raw link at the top of the page, which will be (for your example):

https://raw.githubusercontent.com/someguy/brilliant/master/somefile.txt

Note the change in domain name, and the blob/ part of the path is gone.

To demonstrate this with the requests GitHub repository itself:

>>> import requests
>>> r = requests.get('https://github.com/kennethreitz/requests/blob/master/README.rst')
>>> 'Requests:' in r.text
True
>>> r.headers['Content-Type']
'text/html; charset=utf-8'
>>> r = requests.get('https://raw.githubusercontent.com/kennethreitz/requests/master/README.rst')
>>> 'Requests:' in r.text
True
>>> r.headers['Content-Type']
'text/plain; charset=utf-8'
>>> print r.text
Requests: HTTP for Humans
=========================


.. image:: https://travis-ci.org/kennethreitz/requests.png?branch=master
[... etc. ...]
Answered By: Martijn Pieters

You need to request the raw version of the file, from https://raw.githubusercontent.com.

See the difference:

https://raw.githubusercontent.com/django/django/master/setup.py vs. https://github.com/django/django/blob/master/setup.py

Also, you should probably add a / between your directory and the filename:

>>> getcwd()+'foo.txt'
'/Users/burhanfoo.txt'
>>> import os
>>> os.path.join(getcwd(),'foo.txt')
'/Users/burhan/foo.txt'
Answered By: Burhan Khalid

Just as an update, https://raw.github.com was migrated to https://raw.githubusercontent.com. So the general format is:

url = "https://raw.githubusercontent.com/user/repo/branch/[subfolders]/file"

E.g. https://raw.githubusercontent.com/earnestt1234/seedir/master/setup.py. Still use requests.get(url) as in Martijn’s answer.

Answered By: Tom

Adding a working example ready for copy+paste:

import requests
from requests.structures import CaseInsensitiveDict

url = "https://raw.githubusercontent.com/organization/repo/branch/folder/file"

# If repo is private - we need to add a token in header:
headers = CaseInsensitiveDict()
headers["Authorization"] = "token TOKEN"

resp = requests.get(url, headers=headers)
print(resp.status_code)

(*) If repo is not private – remove the headers part.


Bonus:
Check out this Curl < –> Python-requests online converter.

Answered By: RtmY
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.