Get list of the components of NASDAQ-100
Question:
I am trying to programmatically fetch the list of companies included in NASDAQ-100. I tried scraping Nasdaq-100-Index-Components using Beautiful Soup – bs4
, but so far without much success.
How can I get this list (tickers and company names)?
s = requests.Session()
s.headers.update(
{
"Accept-Language":"en-US,en;q=0.9",
"Accept-Encoding":"gzip, deflate, br",
"User-Agent":"Java-http-client/"
}
)
r = s.get("https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index")
soup = BeautifulSoup(r.content, "html.parser")
res = json.loads([x for x in soup.find("script", {"type": "application/json"})][0])
This only returns a very limited list and I suspect that this naive scraping doesn’t really get all the data.
Answers:
As data is dynamic generated go to chrome developer mode to Network tab
and find data by searching in box and refresh website now you can find link which content company list data as json
data
import requests
headers={"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"}
res=requests.get("https://api.nasdaq.com/api/quote/list-type/nasdaq100",headers=headers)
main_data=res.json()['data']['data']['rows']
for i in range(len(main_data)):
print(main_data[i]['companyName'])
Output:
Activision Blizzard, Inc. Common Stock
Adobe Inc. Common Stock
Advanced Micro Devices, Inc. Common Stock
Align Technology, Inc. Common Stock
..
Image
Using QQQ ETF
To obtain an official list of Nasdaq 100 symbols as constituents of the QQQ ETF:
def list_qqq_holdings() -> pd.DataFrame:
# Ref: https://stackoverflow.com/a/75846060/
# Source: https://www.invesco.com/us/financial-products/etfs/holdings?ticker=QQQ
url = 'https://www.invesco.com/us/financial-products/etfs/holdings/main/holdings/0?audienceType=Investor&action=download&ticker=QQQ'
return pd.read_csv(url, index_col='Holding Ticker')
Using Wikipedia
To obtain an unofficial list of Nasdaq 100 symbols, pandas.read_html
can be used. A parser such as lxml
or bs4
+html5lib
is also required as it is used internally by pandas
.
Note that the list on Wikipedia can be outdated.
import pandas as pd
def list_wikipedia_nasdaq100() -> pd.DataFrame:
# Ref: https://stackoverflow.com/a/75846060/
url = 'https://en.m.wikipedia.org/wiki/Nasdaq-100'
return pd.read_html(url, attrs={'id': "constituents"}, index_col='Ticker')[0]
>> df = list_wikipedia_nasdaq100()
>> df.head()
Company ... GICS Sub-Industry
Ticker ...
ATVI Activision Blizzard ... Interactive Home Entertainment
ADBE Adobe Inc. ... Application Software
ADP ADP ... Data Processing & Outsourced Services
ABNB Airbnb ... Internet & Direct Marketing Retail
ALGN Align Technology ... Health Care Supplies
[5 rows x 3 columns]
>> symbols = df.index.to_list()
>> symbols[:5]
['ATVI', 'ADBE', 'ADP', 'ABNB', 'ALGN']
Using Slickcharts
import pandas as pd
import requests
def list_slickcharts_nasdaq100() -> pd.DataFrame:
# Ref: https://stackoverflow.com/a/75846060/
url = 'https://www.slickcharts.com/nasdaq100'
user_agent = 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0' # Default user-agent fails.
response = requests.get(url, headers={'User-Agent': user_agent})
return pd.read_html(response.text, match='Symbol', index_col='Symbol')[0]
These were tested with Pandas 1.5.3.
The results can be cached for a certain period of time, e.g. 8 hours, in memory and/or on disk, so as to avoid the risk of excessive repeated calls to the source.
A similar answer for the S&P 500 is here.
I am trying to programmatically fetch the list of companies included in NASDAQ-100. I tried scraping Nasdaq-100-Index-Components using Beautiful Soup – bs4
, but so far without much success.
How can I get this list (tickers and company names)?
s = requests.Session()
s.headers.update(
{
"Accept-Language":"en-US,en;q=0.9",
"Accept-Encoding":"gzip, deflate, br",
"User-Agent":"Java-http-client/"
}
)
r = s.get("https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index")
soup = BeautifulSoup(r.content, "html.parser")
res = json.loads([x for x in soup.find("script", {"type": "application/json"})][0])
This only returns a very limited list and I suspect that this naive scraping doesn’t really get all the data.
As data is dynamic generated go to chrome developer mode to Network tab
and find data by searching in box and refresh website now you can find link which content company list data as json
data
import requests
headers={"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"}
res=requests.get("https://api.nasdaq.com/api/quote/list-type/nasdaq100",headers=headers)
main_data=res.json()['data']['data']['rows']
for i in range(len(main_data)):
print(main_data[i]['companyName'])
Output:
Activision Blizzard, Inc. Common Stock
Adobe Inc. Common Stock
Advanced Micro Devices, Inc. Common Stock
Align Technology, Inc. Common Stock
..
Image
Using QQQ ETF
To obtain an official list of Nasdaq 100 symbols as constituents of the QQQ ETF:
def list_qqq_holdings() -> pd.DataFrame:
# Ref: https://stackoverflow.com/a/75846060/
# Source: https://www.invesco.com/us/financial-products/etfs/holdings?ticker=QQQ
url = 'https://www.invesco.com/us/financial-products/etfs/holdings/main/holdings/0?audienceType=Investor&action=download&ticker=QQQ'
return pd.read_csv(url, index_col='Holding Ticker')
Using Wikipedia
To obtain an unofficial list of Nasdaq 100 symbols, pandas.read_html
can be used. A parser such as lxml
or bs4
+html5lib
is also required as it is used internally by pandas
.
Note that the list on Wikipedia can be outdated.
import pandas as pd
def list_wikipedia_nasdaq100() -> pd.DataFrame:
# Ref: https://stackoverflow.com/a/75846060/
url = 'https://en.m.wikipedia.org/wiki/Nasdaq-100'
return pd.read_html(url, attrs={'id': "constituents"}, index_col='Ticker')[0]
>> df = list_wikipedia_nasdaq100()
>> df.head()
Company ... GICS Sub-Industry
Ticker ...
ATVI Activision Blizzard ... Interactive Home Entertainment
ADBE Adobe Inc. ... Application Software
ADP ADP ... Data Processing & Outsourced Services
ABNB Airbnb ... Internet & Direct Marketing Retail
ALGN Align Technology ... Health Care Supplies
[5 rows x 3 columns]
>> symbols = df.index.to_list()
>> symbols[:5]
['ATVI', 'ADBE', 'ADP', 'ABNB', 'ALGN']
Using Slickcharts
import pandas as pd
import requests
def list_slickcharts_nasdaq100() -> pd.DataFrame:
# Ref: https://stackoverflow.com/a/75846060/
url = 'https://www.slickcharts.com/nasdaq100'
user_agent = 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0' # Default user-agent fails.
response = requests.get(url, headers={'User-Agent': user_agent})
return pd.read_html(response.text, match='Symbol', index_col='Symbol')[0]
These were tested with Pandas 1.5.3.
The results can be cached for a certain period of time, e.g. 8 hours, in memory and/or on disk, so as to avoid the risk of excessive repeated calls to the source.
A similar answer for the S&P 500 is here.