Sending an ASP.net POST with Python's Requests

Question:

I’m scraping an old ASP.net website using Python’s requests module.

I’ve spent 5+ hours trying to figure out how to simulate this POST request to no avail. Doing it the way I do it below, I essentially get a message saying “No item matches this item reference.”

Any help would be deeply appreciated – here’s the request and my code, a few things are modified out of respect to brevity and/or privacy:

My own code:

import requests

# Scraping the item number from the website, I have confirmed this is working.

#Then use the newly acquired item number to request the data.
item_url = http://www.example.com/EN/items/Pages/yourrates.aspx?vr= + item_number[0]
viewstate = r'/wEPD...' # Truncated for brevity.

# Create the appropriate request and payload.
payload = {"vr": int(item_number[0])}

item_request_body = {
        "__SPSCEditMenu": "true",
        "MSOWebPartPage_PostbackSource": "",
        "MSOTlPn_SelectedWpId": "",
        "MSOTlPn_View": 0,
        "MSOTlPn_ShowSettings": "False",
        "MSOGallery_SelectedLibrary": "",
        "MSOGallery_FilterString": "",
        "MSOTlPn_Button": "none",
        "__EVENTTARGET": "",
        "__EVENTARGUMENT": "",
        "MSOAuthoringConsole_FormContext": "",
        "MSOAC_EditDuringWorkflow": "",
        "MSOSPWebPartManager_DisplayModeName": "Browse",
        "MSOWebPartPage_Shared": "",
        "MSOLayout_LayoutChanges": "",
        "MSOLayout_InDesignMode": "",
        "MSOSPWebPartManager_OldDisplayModeName": "Browse",
        "MSOSPWebPartManager_StartWebPartEditingName": "false",
        "__VIEWSTATE": viewstate,
        "keywords": "Search our site",
        "__CALLBACKID": "ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22",
        "__CALLBACKPARAM": "startvr"
    }

# Write the appropriate headers for the property information.
item_request_headers = {
    "Host": home_site,
    "Connection": "keep-alive",
    "Content-Length": len(encoded_valuation_request),
    "Cache-Control": "max-age=0",
    "Origin": home_site,
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Cookie": "__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button",
    "Accept": "*/*",
    "Referer": valuation_url,
    "Accept-Encoding": "gzip,deflate,sdch",
    "Accept-Language": "en-US,en;q=0.8"
}

    response = requests.post(url=item_url, params=payload, data=item_request_body, headers=item_request_headers)
    print response.text

What Chrome is telling me the request looks like:

Remote Address:202.55.96.131:80
Request URL:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789
Request Method:POST
Status Code:200 OK

Request Headers
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:21501
Content-Type:application/x-www-form-urlencoded; charset=UTF-8
Cookie___utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button
Host:www.site.com
Origin:www.site.com
Referer:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36

Query String Parameters
vr:123456789

Form Data
__SPSCEditMenu:true
MSOWebPartPage_PostbackSource:
MSOTlPn_SelectedWpId:
MSOTlPn_View:0
MSOTlPn_ShowSettings:False
MSOGallery_SelectedLibrary:
MSOGallery_FilterString:
MSOTlPn_Button:none
__EVENTTARGET:
__EVENTARGUMENT:
MSOAuthoringConsole_FormContext:
MSOAC_EditDuringWorkflow:
MSOSPWebPartManager_DisplayModeName:Browse
MSOWebPartPage_Shared:
MSOLayout_LayoutChanges:
MSOLayout_InDesignMode:
MSOSPWebPartManager_OldDisplayModeName:Browse
MSOSPWebPartManager_StartWebPartEditingName:false
__VIEWSTATE:/wEPD...(Omitted for length)
keywords:Search our site
__CALLBACKID:ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22
__CALLBACKPARAM:startvr
Asked By: David K.

||

Answers:

You have too many request parameters, and should not set the content-type, content-length, host, origin, or connection headers; leave those to requests to set.

You are also doubling up the url parameters; either add the vr parameter to the URL manually or use params, not do both.

It may well be that some of the parameters in the POST body are generated by the ASP application tied to a session. I’d use a GET request with a Session object the valuation_url, parse the form in that page to extract the __CALLBACKID parameter. The requests Session will then store any cookies the server sets and reuse those:

item_request_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
    "Accept": "*/*",
    "Accept-Encoding": "gzip,deflate,sdch",
    "Accept-Language": "en-US,en;q=0.8"
}
payload = {"vr": int(item_number[0])}

session = requests.Session(headers=item_request_headers)

# Get form page
form_response = session.get(validation_url, params=payload) 

# parse form page; BeautifulSoup could do this for example
soup = BeautifulSoup(form_response.content)
callbackid = soup.select('input[name=__CALLBACKID]')[0]['value']

item_request_body = {
    "__SPSCEditMenu": "true",
    "MSOWebPartPage_PostbackSource": "",
    "MSOTlPn_SelectedWpId": "",
    "MSOTlPn_View": 0,
    "MSOTlPn_ShowSettings": "False",
    "MSOGallery_SelectedLibrary": "",
    "MSOGallery_FilterString": "",
    "MSOTlPn_Button": "none",
    "__EVENTTARGET": "",
    "__EVENTARGUMENT": "",
    "MSOAuthoringConsole_FormContext": "",
    "MSOAC_EditDuringWorkflow": "",
    "MSOSPWebPartManager_DisplayModeName": "Browse",
    "MSOWebPartPage_Shared": "",
    "MSOLayout_LayoutChanges": "",
    "MSOLayout_InDesignMode": "",
    "MSOSPWebPartManager_OldDisplayModeName": "Browse",
    "MSOSPWebPartManager_StartWebPartEditingName": "false",
    "__VIEWSTATE": viewstate,
    "keywords": "Search our site",
    "__CALLBACKID": callbackid,
    "__CALLBACKPARAM": "startvr"
}

item_url = 'http://www.example.com/EN/items/Pages/yourrates.aspx'

response = session.post(url=item_url, params=payload, data=item_request_body,
                        headers={'Referer': form_response.url})

The session handles the headers (setting a user agent, and accept parameters), only on the POST with the session do we add a referrer header as well.

Answered By: Martijn Pieters

Relevant to the question title, but not exactly the poster’s situation — I’d like to add one more useful tip to Martijn’s answer that includes some general requests library advice for POST requests.

Examining a request payload on a browser (e.g. Chrome’s Network tab in Developer Tools) may show that there are multiples of certain keys/fields within the payload.

Chrome’s Request Payload Example:

...
"ctl00$cphMain$ctlInvoiceStatuses$lbInvoiceStatus": "AcceptedModified",
"ctl00$cphMain$ctlInvoiceStatuses$lbInvoiceStatus": "InvoiceFullyDisputed",
"ctl00$cphMain$ctlInvoiceStatuses$lbInvoiceStatus": "DisputedItemsClosed",
...

Copying the browser’s request to match this exactly in your requests‘s payload/data argument will NOT work (or at least not get you the results you expect… you’ll still likely get a 200 status code response) — it will only send the value for the LAST occurrence of the key/field.

Requests Data/Payload that will NOT work (or at least not get you the results you expect):

payload = {
   ...
   "ctl00$cphMain$ctlInvoiceStatuses$lbInvoiceStatus": "AcceptedModified",
   "ctl00$cphMain$ctlInvoiceStatuses$lbInvoiceStatus": "InvoiceFullyDisputed",
   "ctl00$cphMain$ctlInvoiceStatuses$lbInvoiceStatus": "DisputedItemsClosed",
   ...
}
r = session.post(url, headers=headers, data=payload)

Instead, you must put these multiple keys’/fields’ values into a list:

Requests Data/Payload that WILL work (or get your the expected results):

payload = {
   ...
   "ctl00$cphMain$ctlInvoiceStatuses$lbInvoiceStatus": ["AcceptedModified", "InvoiceFullyDisputed", "DisputedItemsClosed"],
   ...
}
r = session.post(url, headers=headers, data=payload)

…it took me a few hours to realize this, looking deeply into ASP.NET website mechanics thinking there was I needed to understand there. Nope. So, just trying to save someone some time, hopefully.

Thank you to this Stack Overflow question for helping me come to this realization.

NOTE: You can check exactly what your sent payload looks like by looking at r.request.body on the response object (r, this case) after your request. This is how I realized mine was missing some information (i.e. the multiple fields/keys).

Answered By: amota