Fail to grab the posting dates from encrypted content or so
Question:
I’m trying to scrape the titles and posting dates of different jobs from this webpage. The content of that page seems to be dynamic and loaded using an endpoint. I can parse titles from json response but fail to grab the posting dates.
I’ve tried with:
import requests
from pprint import pprint
link = 'https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(link)
for item in res.json()['data']['items']:
print(item)
Current output:
[12030636, 2611302, 23, -1, '1:1~42.3492~-71.0768', 'Executive Assistant']
[12017824, 2609705, 23, -1, '1:2~42.3943~-71.218', 'Staff Accountant - Accounts Receivable (TEMP)']
[11638522, 2526012, 23, -1, '2:3~42.2093~-70.9963', 'Bookkeeper']
[11626278, 2524450, 23, -1, '1:1~42.3492~-71.0768', 'Top Consulting Company seeking Accounting Associate']
[11353351, 2456092, 23, -1, '1:1~42.3492~-71.0768', 'ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely']
[11348351, 2455214, 23, -1, '1:4~42.3647~-71.1042', 'Bookeeper needed part-time']
Expected output:
Oct 7 Executive Assistant
Oct 7 Staff Accountant - Accounts Receivable (TEMP)
Oct 6 Bookkeeper
Oct 6 Top Consulting Company seeking Accounting Associate
Oct 5 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
Oct 5 Bookeeper needed part-time
How can I achieve the desired output?
Answers:
Try:
import requests
from datetime import datetime
link = "https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
data = requests.get(link, headers=headers).json()
min_posted_date = data["data"]["decode"]["minPostedDate"]
for i in data["data"]["items"]:
t = datetime.fromtimestamp(min_posted_date + i[1])
print(t, i[-1])
Prints:
2022-10-06 19:58:32 Tax Assistant
2022-10-06 18:37:05 Executive Assistant
2022-10-06 18:10:28 Staff Accountant - Accounts Receivable (TEMP)
2022-10-05 18:55:35 Bookkeeper
2022-10-05 18:29:33 Top Consulting Company seeking Accounting Associate
2022-10-04 23:30:15 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
2022-10-04 23:15:37 Bookeeper needed part-time
2022-10-04 21:08:42 According 65 hrs
2022-10-04 17:50:35 Accounts Payable Specialist
2022-10-04 14:50:34 Compliance Assistant- Symphony
2022-10-04 11:57:52 Bookkeeping Assistant
...
I’m trying to scrape the titles and posting dates of different jobs from this webpage. The content of that page seems to be dynamic and loaded using an endpoint. I can parse titles from json response but fail to grab the posting dates.
I’ve tried with:
import requests
from pprint import pprint
link = 'https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(link)
for item in res.json()['data']['items']:
print(item)
Current output:
[12030636, 2611302, 23, -1, '1:1~42.3492~-71.0768', 'Executive Assistant']
[12017824, 2609705, 23, -1, '1:2~42.3943~-71.218', 'Staff Accountant - Accounts Receivable (TEMP)']
[11638522, 2526012, 23, -1, '2:3~42.2093~-70.9963', 'Bookkeeper']
[11626278, 2524450, 23, -1, '1:1~42.3492~-71.0768', 'Top Consulting Company seeking Accounting Associate']
[11353351, 2456092, 23, -1, '1:1~42.3492~-71.0768', 'ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely']
[11348351, 2455214, 23, -1, '1:4~42.3647~-71.1042', 'Bookeeper needed part-time']
Expected output:
Oct 7 Executive Assistant
Oct 7 Staff Accountant - Accounts Receivable (TEMP)
Oct 6 Bookkeeper
Oct 6 Top Consulting Company seeking Accounting Associate
Oct 5 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
Oct 5 Bookeeper needed part-time
How can I achieve the desired output?
Try:
import requests
from datetime import datetime
link = "https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
data = requests.get(link, headers=headers).json()
min_posted_date = data["data"]["decode"]["minPostedDate"]
for i in data["data"]["items"]:
t = datetime.fromtimestamp(min_posted_date + i[1])
print(t, i[-1])
Prints:
2022-10-06 19:58:32 Tax Assistant
2022-10-06 18:37:05 Executive Assistant
2022-10-06 18:10:28 Staff Accountant - Accounts Receivable (TEMP)
2022-10-05 18:55:35 Bookkeeper
2022-10-05 18:29:33 Top Consulting Company seeking Accounting Associate
2022-10-04 23:30:15 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
2022-10-04 23:15:37 Bookeeper needed part-time
2022-10-04 21:08:42 According 65 hrs
2022-10-04 17:50:35 Accounts Payable Specialist
2022-10-04 14:50:34 Compliance Assistant- Symphony
2022-10-04 11:57:52 Bookkeeping Assistant
...