Combining multiple sets of data to one JSON file from api calls
Question:
I need two sets of data from this website:
https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings
Which include both the "Active Positions" and "New and Sold Out Positions" tables. The code i have can only provide one piece of data into a JSON:
import requests
import pandas as pd
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['newSoldOutPositions']['rows'])
df.to_json('AAPL_institutional_positions.json')
This will give the output of the following (JSON):
{
"positions":{
"0":"New Positions",
"1":"Sold Out Positions"
},
"holders":{
"0":"99",
"1":"90"
},
"shares":{
"0":"37,374,118",
"1":"4,637,465"
}
}
Whereas, for the other table I am scraping, I use this code (All’s I have done is change "newSoldOutPositions" to "activePositions"):
import requests
import pandas as pd
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['activePositions']['rows'])
df.to_json('AAPL_institutional_positions.json')
Which gives this output (JSON):
{
"positions":{
"0":"Increased Positions",
"1":"Decreased Positions",
"2":"Held Positions",
"3":"Total Institutional Shares"
},
"holders":{
"0":"1,780",
"1":"2,339",
"2":"283",
"3":"4,402"
},
"shares":{
"0":"239,170,203",
"1":"209,017,331",
"2":"8,965,339,255",
"3":"9,413,526,789"
}
}
So my question being, is how can i combine the scraping to grab both sets of data and output them all in one JSON file?
Thanks
Answers:
If you only want json data, there is no need to use pandas:
import requests
nasdaq_dict = {}
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
nasdaq_dict['activePositions'] = r.json()['data']['activePositions']['rows']
nasdaq_dict['newSoldOutPositions'] = r.json()['data']['newSoldOutPositions']['rows']
print(nasdaq_dict)
Result in terminal:
{'activePositions': [{'positions': 'Increased Positions', 'holders': '1,795', 'shares': '200,069,709'}, {'positions': 'Decreased Positions', 'holders': '2,314', 'shares': '228,105,026'}, {'positions': 'Held Positions', 'holders': '308', 'shares': '8,976,744,094'}, {'positions': 'Total Institutional Shares', 'holders': '4,417', 'shares': '9,404,918,829'}], 'newSoldOutPositions': [{'positions': 'New Positions', 'holders': '121', 'shares': '55,857,143'}, {'positions': 'Sold Out Positions', 'holders': '73', 'shares': '8,851,038'}]}
I need two sets of data from this website:
https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings
Which include both the "Active Positions" and "New and Sold Out Positions" tables. The code i have can only provide one piece of data into a JSON:
import requests
import pandas as pd
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['newSoldOutPositions']['rows'])
df.to_json('AAPL_institutional_positions.json')
This will give the output of the following (JSON):
{
"positions":{
"0":"New Positions",
"1":"Sold Out Positions"
},
"holders":{
"0":"99",
"1":"90"
},
"shares":{
"0":"37,374,118",
"1":"4,637,465"
}
}
Whereas, for the other table I am scraping, I use this code (All’s I have done is change "newSoldOutPositions" to "activePositions"):
import requests
import pandas as pd
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['activePositions']['rows'])
df.to_json('AAPL_institutional_positions.json')
Which gives this output (JSON):
{
"positions":{
"0":"Increased Positions",
"1":"Decreased Positions",
"2":"Held Positions",
"3":"Total Institutional Shares"
},
"holders":{
"0":"1,780",
"1":"2,339",
"2":"283",
"3":"4,402"
},
"shares":{
"0":"239,170,203",
"1":"209,017,331",
"2":"8,965,339,255",
"3":"9,413,526,789"
}
}
So my question being, is how can i combine the scraping to grab both sets of data and output them all in one JSON file?
Thanks
If you only want json data, there is no need to use pandas:
import requests
nasdaq_dict = {}
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
nasdaq_dict['activePositions'] = r.json()['data']['activePositions']['rows']
nasdaq_dict['newSoldOutPositions'] = r.json()['data']['newSoldOutPositions']['rows']
print(nasdaq_dict)
Result in terminal:
{'activePositions': [{'positions': 'Increased Positions', 'holders': '1,795', 'shares': '200,069,709'}, {'positions': 'Decreased Positions', 'holders': '2,314', 'shares': '228,105,026'}, {'positions': 'Held Positions', 'holders': '308', 'shares': '8,976,744,094'}, {'positions': 'Total Institutional Shares', 'holders': '4,417', 'shares': '9,404,918,829'}], 'newSoldOutPositions': [{'positions': 'New Positions', 'holders': '121', 'shares': '55,857,143'}, {'positions': 'Sold Out Positions', 'holders': '73', 'shares': '8,851,038'}]}