Extract 10K filings url for a company using CIK number python

Question

I am working on a project to find the latest 10K filings url for a company using CIK number. Please find the code below:

import requests
from bs4 import BeautifulSoup

# CIK number for Apple is 0001166559
cik_number = "0001166559"
url = f"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik_number}&type=10-K&dateb=&owner=exclude&count=40"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the link to the latest 10-K filing
link = soup.find('a', {'id': 'documentsbutton'})
filing_url = link['href']

print(filing_url)

I am getting HTTP 403 error. Please help me

Thanks

Asked By: Sushmitha Krishnan

||

Source

Answer 1

I was able to get a 200 response by reusing your same snippet. You may have missed to add the headers:

import requests
from bs4 import BeautifulSoup

# CIK number for Apple is 0001166559
cik_number = "0001166559"
url = f'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik_number}&type=10-K&dateb=&owner=exclude&count=40'
# add this
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
response = requests.get(url, headers=headers)
print(response)
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

Output:

NOTE:
You can read more on why do we need to add User-Agent in our headers from here. Basically what you need to do is to make sure that the request looks like that it’s coming from a browser, so just add an the extra header parameter:

Answered By: Kulasangar

Extract 10K filings url for a company using CIK number python

Question:

Answers: