Fail to grab the posting dates from encrypted content or so

Question

I’m trying to scrape the titles and posting dates of different jobs from this webpage. The content of that page seems to be dynamic and loaded using an endpoint. I can parse titles from json response but fail to grab the posting dates.

I’ve tried with:

import requests
from pprint import pprint

link = 'https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link)
    for item in res.json()['data']['items']:
        print(item)

Current output:

[12030636, 2611302, 23, -1, '1:1~42.3492~-71.0768', 'Executive Assistant']
[12017824, 2609705, 23, -1, '1:2~42.3943~-71.218', 'Staff Accountant - Accounts Receivable (TEMP)']
[11638522, 2526012, 23, -1, '2:3~42.2093~-70.9963', 'Bookkeeper']
[11626278, 2524450, 23, -1, '1:1~42.3492~-71.0768', 'Top Consulting Company seeking Accounting Associate']
[11353351, 2456092, 23, -1, '1:1~42.3492~-71.0768', 'ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely']
[11348351, 2455214, 23, -1, '1:4~42.3647~-71.1042', 'Bookeeper needed part-time']

Expected output:

Oct 7 Executive Assistant
Oct 7 Staff Accountant - Accounts Receivable (TEMP)
Oct 6 Bookkeeper
Oct 6 Top Consulting Company seeking Accounting Associate
Oct 5 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
Oct 5 Bookeeper needed part-time

How can I achieve the desired output?

Asked By: robots.txt

||

Source

Answer 1

Try:

import requests
from datetime import datetime

link = "https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}

data = requests.get(link, headers=headers).json()
min_posted_date = data["data"]["decode"]["minPostedDate"]

for i in data["data"]["items"]:
    t = datetime.fromtimestamp(min_posted_date + i[1])
    print(t, i[-1])

Prints:

2022-10-06 19:58:32 Tax Assistant
2022-10-06 18:37:05 Executive Assistant
2022-10-06 18:10:28 Staff Accountant - Accounts Receivable (TEMP)
2022-10-05 18:55:35 Bookkeeper
2022-10-05 18:29:33 Top Consulting Company seeking Accounting Associate
2022-10-04 23:30:15 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
2022-10-04 23:15:37 Bookeeper needed part-time
2022-10-04 21:08:42 According 65 hrs
2022-10-04 17:50:35 Accounts Payable Specialist
2022-10-04 14:50:34 Compliance Assistant- Symphony
2022-10-04 11:57:52 Bookkeeping Assistant

...

Answered By: Andrej Kesely

Fail to grab the posting dates from encrypted content or so

Question:

Answers: