I want to know the url pip has used to download a package (.whl file)
Question:
I need a list of URLs that pip is using to download packages from the internet preferably using a list of python packages in a requirements.txt file.
Do you know a quick and easy solution to this problem? Maybe there is a tool out there or a feature of pip that I’m unaware of?
I need to know the download urls without installing the packages. I can download the package .whl files though.
I have tried using pip download -r requirements.txt which downloads files to my computer from pypi. I can see the urls over the network using wireshark. I don’t know how to programmatically get the url used to download the file.
Answers:
You could use the pip show <package-name>
command in the command prompt. This will display the url where the package can be found under Home-page
.
For example:
> pip show gym
Name: gym
Version: 0.26.1
Summary: Gym: A universal API for reinforcement learning environments
Home-page: https://www.gymlibrary.dev/
Author: Gym Community
Author-email: [email protected]
License: MIT
Location: C:pathtopackages
Requires: cloudpickle, gym-notices, numpy
Required-by:
I have found that I can find the download url of a specific release version by manually parsing the json at https://pypi.org/pypi//json .
It is a shame pip does not implement this functionality by default. Especially for those who want to programmatically check the integrity of their downloads before installation.
The solution I found requires the version to be locked at ‘==’ (this wouldn’t be a big deal if I parsed a lock file instead). Also doesn’t deal with ‘extras’ etc. I’m looking at a way to use poetry instead (parsing a requirements.txt file generated by poetry seems to be easier than dealing with all the edge cases found in a user defined requirements.txt file).
My solution:
def get_packages_from_lock_file(lock_file):
# regexp example text to match: black==22.1.0
# Regex is neccesary as requirements.txt can have hashes and python version requirements in them.
packages = re.findall(r'([_-a-zA-Z0-9]+)==(d+.d+.d+)', lock_file)
# example return: [
# ('black', '22.10.0'),
# ('click', '8.1.3'),
# ('colorama', '0.4.6'),
# ]
return packages
# pip show only shows urls of installed packages. We dont want to install package to see the url
# pip has no way of doing this currently so I am manually parsing the pip package json for the url
def show_package_url(package):
name = package[0]
version = package[1]
try:
package_json = requests.get(url=f"https://pypi.org/pypi/{name}/json").json()
# package version seems to usually contain a .whl file for pip installation and a tar.gz file for direct use
# TODO: do we need to download both files or allow the user an option to choose?
package_json_version = package_json["releases"][version]
return [file["url"] for file in package_json_version]
except HTTPError as ex:
raise ex
packages = get_packages_from_requirements(requirements_file)
for package in packages:
package_urls = show_package_url(package)
for url in package_urls:
job.add_url(url)
This python code will automatically parse the html sites for all of the packages mentioned in a local requirements.txt
file. The result is a list named urls
.
Note: This will not work unless you have a copy of requirements.txt
in the same directory from which you execute this code.
import requests as req
from bs4 import BeautifulSoup as parser
f = open("requirements.txt", 'r').read().split('n')
requirements = []
for line in f:
if '==' in line:
for i in range(len(line)):
if line[i:i+2] == '==':
requirements.append("https://pypi.org/project/" + line[:i])
break
urls = []
for r in requirements:
p = parser(req.get(r).content, "html.parser")
url = str(p.find("div", class_="card file__card").find("a")).split("<a href="")[1].split(""")[0]
urls.append(url)
print(urls)
I need a list of URLs that pip is using to download packages from the internet preferably using a list of python packages in a requirements.txt file.
Do you know a quick and easy solution to this problem? Maybe there is a tool out there or a feature of pip that I’m unaware of?
I need to know the download urls without installing the packages. I can download the package .whl files though.
I have tried using pip download -r requirements.txt which downloads files to my computer from pypi. I can see the urls over the network using wireshark. I don’t know how to programmatically get the url used to download the file.
You could use the pip show <package-name>
command in the command prompt. This will display the url where the package can be found under Home-page
.
For example:
> pip show gym
Name: gym
Version: 0.26.1
Summary: Gym: A universal API for reinforcement learning environments
Home-page: https://www.gymlibrary.dev/
Author: Gym Community
Author-email: [email protected]
License: MIT
Location: C:pathtopackages
Requires: cloudpickle, gym-notices, numpy
Required-by:
I have found that I can find the download url of a specific release version by manually parsing the json at https://pypi.org/pypi//json .
It is a shame pip does not implement this functionality by default. Especially for those who want to programmatically check the integrity of their downloads before installation.
The solution I found requires the version to be locked at ‘==’ (this wouldn’t be a big deal if I parsed a lock file instead). Also doesn’t deal with ‘extras’ etc. I’m looking at a way to use poetry instead (parsing a requirements.txt file generated by poetry seems to be easier than dealing with all the edge cases found in a user defined requirements.txt file).
My solution:
def get_packages_from_lock_file(lock_file):
# regexp example text to match: black==22.1.0
# Regex is neccesary as requirements.txt can have hashes and python version requirements in them.
packages = re.findall(r'([_-a-zA-Z0-9]+)==(d+.d+.d+)', lock_file)
# example return: [
# ('black', '22.10.0'),
# ('click', '8.1.3'),
# ('colorama', '0.4.6'),
# ]
return packages
# pip show only shows urls of installed packages. We dont want to install package to see the url
# pip has no way of doing this currently so I am manually parsing the pip package json for the url
def show_package_url(package):
name = package[0]
version = package[1]
try:
package_json = requests.get(url=f"https://pypi.org/pypi/{name}/json").json()
# package version seems to usually contain a .whl file for pip installation and a tar.gz file for direct use
# TODO: do we need to download both files or allow the user an option to choose?
package_json_version = package_json["releases"][version]
return [file["url"] for file in package_json_version]
except HTTPError as ex:
raise ex
packages = get_packages_from_requirements(requirements_file)
for package in packages:
package_urls = show_package_url(package)
for url in package_urls:
job.add_url(url)
This python code will automatically parse the html sites for all of the packages mentioned in a local requirements.txt
file. The result is a list named urls
.
Note: This will not work unless you have a copy of requirements.txt
in the same directory from which you execute this code.
import requests as req
from bs4 import BeautifulSoup as parser
f = open("requirements.txt", 'r').read().split('n')
requirements = []
for line in f:
if '==' in line:
for i in range(len(line)):
if line[i:i+2] == '==':
requirements.append("https://pypi.org/project/" + line[:i])
break
urls = []
for r in requirements:
p = parser(req.get(r).content, "html.parser")
url = str(p.find("div", class_="card file__card").find("a")).split("<a href="")[1].split(""")[0]
urls.append(url)
print(urls)