Python: Getting a table in CSV from a website without a table class
Question:
I’m a newbie seeking help.
I’ve tried without success with the following.
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.canada.ca/en/immigration-refugees-citizenship/corporate/mandate/policies-operational-instructions-agreements/ministerial-instructions/express-entry-rounds.html"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'html.parser')
data = []
# Verifying tables and their classes
print('Classes of each table:')
for table in soup.find_all('table'):
print(table.get('class'))
Result:
[‘table’]
None
Can anyone help me with how to get this data?
Thank you so much.
Answers:
The data you see on the page is loaded from external URL. To load the data you can use next example:
import requests
import pandas as pd
url = "https://www.canada.ca/content/dam/ircc/documents/json/ee_rounds_123_en.json"
data = requests.get(url).json()
df = pd.DataFrame(data["rounds"])
df = df.drop(columns=["drawNumberURL", "DrawText1", "mitext"])
print(df.head(10).to_markdown(index=False))
Prints:
drawNumber
drawDate
drawDateFull
drawName
drawSize
drawCRS
drawText2
drawDateTime
drawCutOff
drawDistributionAsOn
dd1
dd2
dd3
dd4
dd5
dd6
dd7
dd8
dd9
dd10
dd11
dd12
dd13
dd14
dd15
dd16
dd17
dd18
231
2022-09-14
September 14, 2022
No Program Specified
3,250
510
Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program
September 14, 2022 at 13:29:26 UTC
January 08, 2022 at 10:24:52 UTC
September 12, 2022
408
6,228
63,860
5,845
9,505
19,156
16,541
12,813
58,019
12,245
12,635
9,767
11,186
12,186
68,857
35,833
5,068
238,273
230
2022-08-31
August 31, 2022
No Program Specified
2,750
516
Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program
August 31, 2022 at 13:55:23 UTC
April 16, 2022 at 18:24:41 UTC
August 29, 2022
466
7,224
63,270
5,554
9,242
19,033
16,476
12,965
58,141
12,287
12,758
9,796
11,105
12,195
68,974
36,001
5,120
239,196
229
2022-08-17
August 17, 2022
No Program Specified
2,250
525
Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program
August 17, 2022 at 13:43:47 UTC
December 28, 2021 at 11:03:15 UTC
August 15, 2022
538
8,221
62,753
5,435
9,129
18,831
16,465
12,893
58,113
12,200
12,721
9,801
11,138
12,253
68,440
35,745
5,137
238,947
228
2022-08-03
August 3, 2022
No Program Specified
2,000
533
Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program
August 03, 2022 at 15:16:24 UTC
January 06, 2022 at 14:29:50 UTC
August 2, 2022
640
8,975
62,330
5,343
9,044
18,747
16,413
12,783
57,987
12,101
12,705
9,747
11,117
12,317
68,325
35,522
5,145
238,924
227
2022-07-20
July 20, 2022
No Program Specified
1,750
542
Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program
July 20, 2022 at 16:32:49 UTC
December 30, 2021 at 15:29:35 UTC
July 18, 2022
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
226
2022-07-06
July 6, 2022
No Program Specified
1,500
557
Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program
July 6, 2022 at 14:34:34 UTC
November 13, 2021 at 02:20:46 UTC
July 11, 2022
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
225
2022-06-22
June 22, 2022
Provincial Nominee Program
636
752
Provincial Nominee Program
June 22, 2022 at 14:13:57 UTC
April 19, 2022 at 13:45:45 UTC
June 20, 2022
664
8,017
55,917
4,246
7,845
16,969
15,123
11,734
53,094
10,951
11,621
8,800
10,325
11,397
64,478
33,585
4,919
220,674
224
2022-06-08
June 8, 2022
Provincial Nominee Program
932
796
Provincial Nominee Program
June 08, 2022 at 14:03:28 UTC
October 18, 2021 at 17:13:17 UTC
June 6, 2022
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
223
2022-05-25
May 25, 2022
Provincial Nominee Program
590
741
Provincial Nominee Program
May 25, 2022 at 13:21:23 UTC
February 02, 2022 at 12:29:53 UTC
May 23, 2022
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
222
2022-05-11
May 11, 2022
Provincial Nominee Program
545
753
Provincial Nominee Program
May 11, 2022 at 14:08:07 UTC
December 15, 2021 at 20:32:57 UTC
May 9, 2022
635
7,193
52,684
3,749
7,237
16,027
14,466
11,205
50,811
10,484
11,030
8,393
9,945
10,959
62,341
32,590
4,839
211,093
I’m a newbie seeking help.
I’ve tried without success with the following.
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.canada.ca/en/immigration-refugees-citizenship/corporate/mandate/policies-operational-instructions-agreements/ministerial-instructions/express-entry-rounds.html"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'html.parser')
data = []
# Verifying tables and their classes
print('Classes of each table:')
for table in soup.find_all('table'):
print(table.get('class'))
Result:
[‘table’]
None
Can anyone help me with how to get this data?
Thank you so much.
The data you see on the page is loaded from external URL. To load the data you can use next example:
import requests
import pandas as pd
url = "https://www.canada.ca/content/dam/ircc/documents/json/ee_rounds_123_en.json"
data = requests.get(url).json()
df = pd.DataFrame(data["rounds"])
df = df.drop(columns=["drawNumberURL", "DrawText1", "mitext"])
print(df.head(10).to_markdown(index=False))
Prints:
drawNumber | drawDate | drawDateFull | drawName | drawSize | drawCRS | drawText2 | drawDateTime | drawCutOff | drawDistributionAsOn | dd1 | dd2 | dd3 | dd4 | dd5 | dd6 | dd7 | dd8 | dd9 | dd10 | dd11 | dd12 | dd13 | dd14 | dd15 | dd16 | dd17 | dd18 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
231 | 2022-09-14 | September 14, 2022 | No Program Specified | 3,250 | 510 | Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program | September 14, 2022 at 13:29:26 UTC | January 08, 2022 at 10:24:52 UTC | September 12, 2022 | 408 | 6,228 | 63,860 | 5,845 | 9,505 | 19,156 | 16,541 | 12,813 | 58,019 | 12,245 | 12,635 | 9,767 | 11,186 | 12,186 | 68,857 | 35,833 | 5,068 | 238,273 |
230 | 2022-08-31 | August 31, 2022 | No Program Specified | 2,750 | 516 | Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program | August 31, 2022 at 13:55:23 UTC | April 16, 2022 at 18:24:41 UTC | August 29, 2022 | 466 | 7,224 | 63,270 | 5,554 | 9,242 | 19,033 | 16,476 | 12,965 | 58,141 | 12,287 | 12,758 | 9,796 | 11,105 | 12,195 | 68,974 | 36,001 | 5,120 | 239,196 |
229 | 2022-08-17 | August 17, 2022 | No Program Specified | 2,250 | 525 | Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program | August 17, 2022 at 13:43:47 UTC | December 28, 2021 at 11:03:15 UTC | August 15, 2022 | 538 | 8,221 | 62,753 | 5,435 | 9,129 | 18,831 | 16,465 | 12,893 | 58,113 | 12,200 | 12,721 | 9,801 | 11,138 | 12,253 | 68,440 | 35,745 | 5,137 | 238,947 |
228 | 2022-08-03 | August 3, 2022 | No Program Specified | 2,000 | 533 | Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program | August 03, 2022 at 15:16:24 UTC | January 06, 2022 at 14:29:50 UTC | August 2, 2022 | 640 | 8,975 | 62,330 | 5,343 | 9,044 | 18,747 | 16,413 | 12,783 | 57,987 | 12,101 | 12,705 | 9,747 | 11,117 | 12,317 | 68,325 | 35,522 | 5,145 | 238,924 |
227 | 2022-07-20 | July 20, 2022 | No Program Specified | 1,750 | 542 | Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program | July 20, 2022 at 16:32:49 UTC | December 30, 2021 at 15:29:35 UTC | July 18, 2022 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
226 | 2022-07-06 | July 6, 2022 | No Program Specified | 1,500 | 557 | Federal Skilled Worker, Canadian Experience Class, Federal Skilled Trades and Provincial Nominee Program | July 6, 2022 at 14:34:34 UTC | November 13, 2021 at 02:20:46 UTC | July 11, 2022 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
225 | 2022-06-22 | June 22, 2022 | Provincial Nominee Program | 636 | 752 | Provincial Nominee Program | June 22, 2022 at 14:13:57 UTC | April 19, 2022 at 13:45:45 UTC | June 20, 2022 | 664 | 8,017 | 55,917 | 4,246 | 7,845 | 16,969 | 15,123 | 11,734 | 53,094 | 10,951 | 11,621 | 8,800 | 10,325 | 11,397 | 64,478 | 33,585 | 4,919 | 220,674 |
224 | 2022-06-08 | June 8, 2022 | Provincial Nominee Program | 932 | 796 | Provincial Nominee Program | June 08, 2022 at 14:03:28 UTC | October 18, 2021 at 17:13:17 UTC | June 6, 2022 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
223 | 2022-05-25 | May 25, 2022 | Provincial Nominee Program | 590 | 741 | Provincial Nominee Program | May 25, 2022 at 13:21:23 UTC | February 02, 2022 at 12:29:53 UTC | May 23, 2022 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
222 | 2022-05-11 | May 11, 2022 | Provincial Nominee Program | 545 | 753 | Provincial Nominee Program | May 11, 2022 at 14:08:07 UTC | December 15, 2021 at 20:32:57 UTC | May 9, 2022 | 635 | 7,193 | 52,684 | 3,749 | 7,237 | 16,027 | 14,466 | 11,205 | 50,811 | 10,484 | 11,030 | 8,393 | 9,945 | 10,959 | 62,341 | 32,590 | 4,839 | 211,093 |