I want to know the url pip has used to download a package (.whl file)

Question:

I need a list of URLs that pip is using to download packages from the internet preferably using a list of python packages in a requirements.txt file.

Do you know a quick and easy solution to this problem? Maybe there is a tool out there or a feature of pip that I’m unaware of?

I need to know the download urls without installing the packages. I can download the package .whl files though.

I have tried using pip download -r requirements.txt which downloads files to my computer from pypi. I can see the urls over the network using wireshark. I don’t know how to programmatically get the url used to download the file.

Asked By: Jack

||

Answers:

You could use the pip show <package-name> command in the command prompt. This will display the url where the package can be found under Home-page.

For example:

> pip show gym
Name: gym
Version: 0.26.1
Summary: Gym: A universal API for reinforcement learning environments
Home-page: https://www.gymlibrary.dev/
Author: Gym Community
Author-email: [email protected]
License: MIT
Location: C:pathtopackages
Requires: cloudpickle, gym-notices, numpy
Required-by:
Answered By: Jack Bosco

I have found that I can find the download url of a specific release version by manually parsing the json at https://pypi.org/pypi//json .

It is a shame pip does not implement this functionality by default. Especially for those who want to programmatically check the integrity of their downloads before installation.

The solution I found requires the version to be locked at ‘==’ (this wouldn’t be a big deal if I parsed a lock file instead). Also doesn’t deal with ‘extras’ etc. I’m looking at a way to use poetry instead (parsing a requirements.txt file generated by poetry seems to be easier than dealing with all the edge cases found in a user defined requirements.txt file).

My solution:

def get_packages_from_lock_file(lock_file):
    # regexp example text to match: black==22.1.0
    # Regex is neccesary as requirements.txt can have hashes and python version requirements in them.
    packages = re.findall(r'([_-a-zA-Z0-9]+)==(d+.d+.d+)', lock_file)
    # example return: [
    #     ('black', '22.10.0'),
    #     ('click', '8.1.3'),
    #     ('colorama', '0.4.6'),
    # ]
    return packages


# pip show only shows urls of installed packages. We dont want to install package to see the url
# pip has no way of doing this currently so I am manually parsing the pip package json for the url
def show_package_url(package):
    name = package[0]
    version = package[1]
    try:
        package_json = requests.get(url=f"https://pypi.org/pypi/{name}/json").json()
        # package version seems to usually contain a .whl file for pip installation and a tar.gz file for direct use
        # TODO: do we need to download both files or allow the user an option to choose?
        package_json_version = package_json["releases"][version]
        return [file["url"] for file in package_json_version]
    except HTTPError as ex:
        raise ex

    packages = get_packages_from_requirements(requirements_file)
    for package in packages:
        package_urls = show_package_url(package)
        for url in package_urls:
            job.add_url(url)
Answered By: Jack

This python code will automatically parse the html sites for all of the packages mentioned in a local requirements.txt file. The result is a list named urls.

Note: This will not work unless you have a copy of requirements.txt in the same directory from which you execute this code.

import requests as req
from bs4 import BeautifulSoup as parser

f = open("requirements.txt", 'r').read().split('n')
requirements = []
for line in f:
    if '==' in line:
        for i in range(len(line)):
            if line[i:i+2] == '==':
                requirements.append("https://pypi.org/project/" + line[:i])
                break

urls = []
for r in requirements:
    p = parser(req.get(r).content, "html.parser")
    url = str(p.find("div", class_="card file__card").find("a")).split("<a href="")[1].split(""")[0]
    urls.append(url)

print(urls)
Answered By: Jack Bosco
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.