Web scrape table with large amounts of data

Question:

I am looking to web scrape a table consiting of 4000+ rows from the following website:

https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings

Preferably I need someone to show how to use the Nasdaq api if possible. I believe the way I’d normally webscrape (using beautifulSoup) would be very inefficient for this task.

Thanks!

Asked By: kiestuthridge23

||

Answers:

The table is paginated, and every page is a new XHR call bringing 15 new records (offset by previous entries). Let’s manipulate the url in our advantage – let’s request, say, 7k records at once, with 0 offset (there are approx 4k entries total):

import requests
import pandas as pd
headers = {
    'accept': 'application/json, text/plain, */*',
    'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=7000&offset=0&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['holdingsTransactions']['table']['rows'])
print(df)

Result:

ownerName   date    sharesHeld  sharesChange    sharesChangePCT marketValue url
0   VANGUARD GROUP INC  06/30/2022  1,277,319,054   7,323,304   0.577%  $191,214,662    /market-activity/institutional-portfolio/vanguard-group-inc-61322
1   BLACKROCK INC.  06/30/2022  1,028,688,317   1,055,430   0.103%  $153,994,641    /market-activity/institutional-portfolio/blackrock-inc-711679
2   BERKSHIRE HATHAWAY INC  06/30/2022  894,802,319 3,878,909   0.435%  $133,951,907    /market-activity/institutional-portfolio/berkshire-hathaway-inc-54239
3   STATE STREET CORP   06/30/2022  598,178,524 -15,673,750 -2.553% $89,547,325 /market-activity/institutional-portfolio/state-street-corp-6697
4   FMR LLC 09/30/2022  350,900,116 6,582,142   1.912%  $52,529,747 /market-activity/institutional-portfolio/fmr-llc-12407
... ... ... ... ... ... ... ...
4397    VERSOR INVESTMENTS LP   09/30/2022  0   -5,171  Sold Out        /market-activity/institutional-portfolio/versor-investments-lp-1015149
4398    WALLEYE CAPITAL LLC 06/30/2022  0   -44,561 Sold Out        /market-activity/institutional-portfolio/walleye-capital-llc-1069483
4399    WALLEYE TRADING LLC 06/30/2022  0   -65,383 Sold Out        /market-activity/institutional-portfolio/walleye-trading-llc-733607
4400    WARATAH CAPITAL ADVISORS LTD.   09/30/2022  0   -31,149 Sold Out        /market-activity/institutional-portfolio/waratah-capital-advisors-ltd-901912
4401    WINSLOW CAPITAL MANAGEMENT, LLC 06/30/2022  0   -2,386  Sold Out        /market-activity/institutional-portfolio/winslow-capital-management-llc-64122
4402 rows × 7 columns
Answered By: Barry the Platipus
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.