How do I let python know that I click "search" button for aspx website?
Question:
I want to get all the shareholding in CCASS and % of the total number of issued shares from on 03/04/2017. Here is the link.
http://www.hkexnews.hk/sdw/search/mutualmarket.aspx?t=hk
Here is my code.
import requests
from bs4 import BeautifulSoup
url = "http://www.hkexnews.hk/sdw/search/mutualmarket.aspx?t=hk"
data = {
"sortBy":"",
"alertMsg":"",
"ddlShareholdingDay":"03",
"ddlShareholdingMonth":"04",
"ddlShareholdingYear":"2017",
}
req = requests.post(url, data)
soup = BeautifulSoup(req.content, 'html.parser')
print(soup)
Then the output shows the data of the original website, which is the data without clicking “search” button.
I think the problem is form data, I don’t know how to let python know that I click “search” button.
Here is the form data.
form data
The rest of the form data are __VIEWSTATE, __VIEWSTATEGENERATOR and __EVENTVALIDATION.
By the way, I don’t know what is btnSearch.x and btnSearch.y, they are always changing when I click “search”.
Thank you very much.
Answers:
You can try this code:
import requests
from bs4 import BeautifulSoup
html = "http://www.hkexnews.hk/sdw/search/mutualmarket.aspx?t=hk"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
all_tables=[[td.text for td in tr.find_all('td')] for tr in soup.find_all('table')[2].find_all('tr')]
stock_info=[[sub_item.replace('rn', '') for sub_item in item] for item in all_tables]
for stock in stock_info[2:]:
print("Stock code {}".format(stock[0]))
print("Stock Name {}".format(stock[1]))
print("Shareholding in CCASS {}".format(stock[2]))
print("Shares Percentage {}".format(stock[3]))
print("---------------------------------- n")
Sample of output:
Stock code 1
Stock Name CK HUTCHISON HOLDINGS LIMITED
Shareholding in CCASS 11,746,298
Shares Percentage 0.30%
----------------------------------
Stock code 2
Stock Name CLP HOLDINGS LIMITED
Shareholding in CCASS 3,160,800
Shares Percentage 0.11%
----------------------------------
Stock code 3
Stock Name HONG KONG AND CHINA GAS COMPANY LIMITED, THE
Shareholding in CCASS 17,183,763
Shares Percentage 0.11%
----------------------------------
Stock code 4
Stock Name WHARF (HOLDINGS) LIMITED, THE
Shareholding in CCASS 2,828,000
Shares Percentage 0.09%
----------------------------------
Here is my code to handle date, 360 days max. to scrap, include the date of typhoon no. 8 pausing trading
for i in range(360):
datetime_today = datetime.date.today() - datetime.timedelta(days=i)
datetime_today = datetime_today + datetime.timedelta(days=-1)
trade_day = datetime.datetime.strptime(str(datetime_today), '%Y-%m-%d').date()
weekdaylist = [0,1,2,3,4]
if trade_day.weekday() in weekdaylist:
day_list.append(str(trade_day))
holidays = ['2017-08-23', '2017-10-02', '2017-10-05', '2017-05-30', '2017-05-01', '2017-05-03', '2017-04-17', '2017-04-14', '2017-04-04','2017-01-30','2017-01-31','2017-01-02']
day_list = [x for x in day_list if x not in holidays]
I want to get all the shareholding in CCASS and % of the total number of issued shares from on 03/04/2017. Here is the link.
http://www.hkexnews.hk/sdw/search/mutualmarket.aspx?t=hk
Here is my code.
import requests
from bs4 import BeautifulSoup
url = "http://www.hkexnews.hk/sdw/search/mutualmarket.aspx?t=hk"
data = {
"sortBy":"",
"alertMsg":"",
"ddlShareholdingDay":"03",
"ddlShareholdingMonth":"04",
"ddlShareholdingYear":"2017",
}
req = requests.post(url, data)
soup = BeautifulSoup(req.content, 'html.parser')
print(soup)
Then the output shows the data of the original website, which is the data without clicking “search” button.
I think the problem is form data, I don’t know how to let python know that I click “search” button.
Here is the form data.
form data
The rest of the form data are __VIEWSTATE, __VIEWSTATEGENERATOR and __EVENTVALIDATION.
By the way, I don’t know what is btnSearch.x and btnSearch.y, they are always changing when I click “search”.
Thank you very much.
You can try this code:
import requests
from bs4 import BeautifulSoup
html = "http://www.hkexnews.hk/sdw/search/mutualmarket.aspx?t=hk"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
all_tables=[[td.text for td in tr.find_all('td')] for tr in soup.find_all('table')[2].find_all('tr')]
stock_info=[[sub_item.replace('rn', '') for sub_item in item] for item in all_tables]
for stock in stock_info[2:]:
print("Stock code {}".format(stock[0]))
print("Stock Name {}".format(stock[1]))
print("Shareholding in CCASS {}".format(stock[2]))
print("Shares Percentage {}".format(stock[3]))
print("---------------------------------- n")
Sample of output:
Stock code 1
Stock Name CK HUTCHISON HOLDINGS LIMITED
Shareholding in CCASS 11,746,298
Shares Percentage 0.30%
----------------------------------
Stock code 2
Stock Name CLP HOLDINGS LIMITED
Shareholding in CCASS 3,160,800
Shares Percentage 0.11%
----------------------------------
Stock code 3
Stock Name HONG KONG AND CHINA GAS COMPANY LIMITED, THE
Shareholding in CCASS 17,183,763
Shares Percentage 0.11%
----------------------------------
Stock code 4
Stock Name WHARF (HOLDINGS) LIMITED, THE
Shareholding in CCASS 2,828,000
Shares Percentage 0.09%
----------------------------------
Here is my code to handle date, 360 days max. to scrap, include the date of typhoon no. 8 pausing trading
for i in range(360):
datetime_today = datetime.date.today() - datetime.timedelta(days=i)
datetime_today = datetime_today + datetime.timedelta(days=-1)
trade_day = datetime.datetime.strptime(str(datetime_today), '%Y-%m-%d').date()
weekdaylist = [0,1,2,3,4]
if trade_day.weekday() in weekdaylist:
day_list.append(str(trade_day))
holidays = ['2017-08-23', '2017-10-02', '2017-10-05', '2017-05-30', '2017-05-01', '2017-05-03', '2017-04-17', '2017-04-14', '2017-04-04','2017-01-30','2017-01-31','2017-01-02']
day_list = [x for x in day_list if x not in holidays]