Why is request returning must provide query string when scraped?
Question:
Here’s my current code:
dcID="RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl"
slugcat="vegetables-1-a0d03d59"
url="https://www.sayurbox.com/graphql/v1?deduplicate=1"
payload={"operationName":"getCartItemCount",
"variables":{"deliveryConfigId":DCId},
"query":"query getCartItemCount($deliveryConfigId: ID!) {n cart(deliveryConfigId: $deliveryConfigId) {n idn countn __typenamen }n}"},{"operationName":"getProducts",
"variables":{"deliveryConfigId":DCId,
"sortBy":"related_product",
"isInstantDelivery":False,
"slug":slugcat,
"first":12,
"abTestFeatures":[]},
"query":"query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) {n productsByCategoryOrSubcategoryAndDeliveryConfig(n deliveryConfigId: $deliveryConfigIdn sortBy: $sortByn slug: $slugn after: $aftern first: $firstn isInstantDelivery: $isInstantDeliveryn abTestFeatures: $abTestFeaturesn ) {n edges {n node {n ...ProductInfoFragmentn __typenamen }n __typenamen }n pageInfo {n hasNextPagen endCursorn __typenamen }n productBuildern __typenamen }n}nnfragment ProductInfoFragment on Product {n idn uuidn deliveryConfigIdn displayNamen priceRangesn priceMinn priceMaxn actualPriceMinn actualPriceMaxn slugn labeln isInstantn isInstantOnlyn nextDayAvailabilityn heroImagen promon discountn isDiscountn variantTypen imageIdsn isStockAvailablen defaultVariantSkuCoden quantitySoldFormattedn promotion {n quotan isShownn campaignIdn __typenamen }n productVariants {n productVariant {n idn skuCoden variantNamen maxQtyn isDiscountn stockAvailablen promotion {n quotan campaignIdn isShownn __typenamen }n __typenamen }n pageInfo {n hasPreviousPagen hasNextPagen __typenamen }n __typenamen }n __typenamen}"}
response=requests.get(url,headers=headers,json=payload)
response.json()
The response returns
[{'errors': [{'message': 'Must provide query string.',
'extensions': {'timestamp': 1675842901472}}]},
{'errors': [{'message': 'Must provide query string.',
'extensions': {'timestamp': 1675842901472}}]}]
I am not sure where I went wrong, as I’ve copied the payload and headers exactly. Can someone help?
Answers:
Get requests generally shouldn’t have a payload. I think these are just query parameters you’re trying to supply. Try changing the payload json
argument to params
. https://www.w3schools.com/python/ref_requests_get.asp
First, the request should be a POST and not a GET. Second thing, I think you don’t want to operate on "getCartItemCount" but probably on "getProducts".
DCId = 'RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl'
slugcat = 'vegetables-1-a0d03d59'
url = 'https://www.sayurbox.com/graphql/v1?deduplicate=1'
payload = {
'operationName': 'getProducts',
'variables': {
'deliveryConfigId': DCId,
'sortBy': 'related_product',
'isInstantDelivery': False,
'slug': slugcat,
'first': 12,
'abTestFeatures': ['category-page-subcategory-section-v5#####control']
},
'query': 'query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) { productsByCategoryOrSubcategoryAndDeliveryConfig( deliveryConfigId: $deliveryConfigId sortBy: $sortBy slug: $slug after: $after first: $first isInstantDelivery: $isInstantDelivery abTestFeatures: $abTestFeatures ) { edges { node { ...ProductInfoFragment __typename } __typename } pageInfo { hasNextPage endCursor __typename } productBuilder __typename }}fragment ProductInfoFragment on Product { id uuid deliveryConfigId displayName priceRanges priceMin priceMax actualPriceMin actualPriceMax slug label isInstant isInstantOnly nextDayAvailability heroImage promo discount isDiscount variantType imageIds isStockAvailable defaultVariantSkuCode quantitySoldFormatted promotion { quota isShown campaignId __typename } productVariants { productVariant { id skuCode variantName maxQty isDiscount stockAvailable promotion { quota campaignId isShown __typename } __typename } pageInfo { hasPreviousPage hasNextPage __typename } __typename } __typename}'}
response = requests.post(url, headers=headers, json=payload1)
data = response.json()
Output (with Pandas):
import pandas as pd
df = pd.json_normalize([node['node'] for node in data['data']['productsByCategoryOrSubcategoryAndDeliveryConfig']['edges']])
>>> df
id uuid ... productVariants.pageInfo.__typename productVariants.__typename
0 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 479c7805-3b26-4bb9-93b9-5689a2d3bb9d ... PageInfo productVariantsConnection
1 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ba7154a1-e784-451d-88e0-10ede13d55b3 ... PageInfo productVariantsConnection
2 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 5e023650-50fa-4adc-800d-be14cac7f1eb ... PageInfo productVariantsConnection
3 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... eec5c6fa-70b9-45d8-a316-6820d1ed68c3 ... PageInfo productVariantsConnection
4 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ee1a0910-f021-48e4-a8d0-ab54f4358bde ... PageInfo productVariantsConnection
5 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 17dccf7a-0763-4c34-a537-7b746bdba683 ... PageInfo productVariantsConnection
6 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 90bbee6d-184e-4d8b-8702-77b660883a00 ... PageInfo productVariantsConnection
7 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... f7e51319-0dd3-4c21-9bba-bc8e3f71db94 ... PageInfo productVariantsConnection
8 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 9f889a62-9302-48db-a972-cff035440ee4 ... PageInfo productVariantsConnection
9 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... dd58f053-238f-45f6-b937-687c1e1db3b0 ... PageInfo productVariantsConnection
10 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 05c37b4e-cf0f-4cf5-a9a8-20ea00029063 ... PageInfo productVariantsConnection
11 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... e559850a-2344-4bb4-be70-932214aace91 ... PageInfo productVariantsConnection
[12 rows x 30 columns]
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 12 non-null object
1 uuid 12 non-null object
2 deliveryConfigId 12 non-null object
3 displayName 12 non-null object
4 priceRanges 12 non-null object
5 priceMin 12 non-null int64
6 priceMax 12 non-null int64
7 actualPriceMin 12 non-null int64
8 actualPriceMax 12 non-null int64
9 slug 12 non-null object
10 label 0 non-null object
11 isInstant 12 non-null bool
12 isInstantOnly 12 non-null bool
13 nextDayAvailability 12 non-null bool
14 heroImage 12 non-null object
15 promo 12 non-null object
16 discount 12 non-null object
17 isDiscount 12 non-null bool
18 variantType 12 non-null object
19 imageIds 12 non-null object
20 isStockAvailable 12 non-null bool
21 defaultVariantSkuCode 12 non-null object
22 quantitySoldFormatted 12 non-null object
23 promotion 0 non-null object
24 __typename 12 non-null object
25 productVariants.productVariant 12 non-null object
26 productVariants.pageInfo.hasPreviousPage 12 non-null bool
27 productVariants.pageInfo.hasNextPage 12 non-null bool
28 productVariants.pageInfo.__typename 12 non-null object
29 productVariants.__typename 12 non-null object
dtypes: bool(7), int64(4), object(19)
memory usage: 2.4+ KB
Here’s my current code:
dcID="RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl"
slugcat="vegetables-1-a0d03d59"
url="https://www.sayurbox.com/graphql/v1?deduplicate=1"
payload={"operationName":"getCartItemCount",
"variables":{"deliveryConfigId":DCId},
"query":"query getCartItemCount($deliveryConfigId: ID!) {n cart(deliveryConfigId: $deliveryConfigId) {n idn countn __typenamen }n}"},{"operationName":"getProducts",
"variables":{"deliveryConfigId":DCId,
"sortBy":"related_product",
"isInstantDelivery":False,
"slug":slugcat,
"first":12,
"abTestFeatures":[]},
"query":"query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) {n productsByCategoryOrSubcategoryAndDeliveryConfig(n deliveryConfigId: $deliveryConfigIdn sortBy: $sortByn slug: $slugn after: $aftern first: $firstn isInstantDelivery: $isInstantDeliveryn abTestFeatures: $abTestFeaturesn ) {n edges {n node {n ...ProductInfoFragmentn __typenamen }n __typenamen }n pageInfo {n hasNextPagen endCursorn __typenamen }n productBuildern __typenamen }n}nnfragment ProductInfoFragment on Product {n idn uuidn deliveryConfigIdn displayNamen priceRangesn priceMinn priceMaxn actualPriceMinn actualPriceMaxn slugn labeln isInstantn isInstantOnlyn nextDayAvailabilityn heroImagen promon discountn isDiscountn variantTypen imageIdsn isStockAvailablen defaultVariantSkuCoden quantitySoldFormattedn promotion {n quotan isShownn campaignIdn __typenamen }n productVariants {n productVariant {n idn skuCoden variantNamen maxQtyn isDiscountn stockAvailablen promotion {n quotan campaignIdn isShownn __typenamen }n __typenamen }n pageInfo {n hasPreviousPagen hasNextPagen __typenamen }n __typenamen }n __typenamen}"}
response=requests.get(url,headers=headers,json=payload)
response.json()
The response returns
[{'errors': [{'message': 'Must provide query string.',
'extensions': {'timestamp': 1675842901472}}]},
{'errors': [{'message': 'Must provide query string.',
'extensions': {'timestamp': 1675842901472}}]}]
I am not sure where I went wrong, as I’ve copied the payload and headers exactly. Can someone help?
Get requests generally shouldn’t have a payload. I think these are just query parameters you’re trying to supply. Try changing the payload json
argument to params
. https://www.w3schools.com/python/ref_requests_get.asp
First, the request should be a POST and not a GET. Second thing, I think you don’t want to operate on "getCartItemCount" but probably on "getProducts".
DCId = 'RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl'
slugcat = 'vegetables-1-a0d03d59'
url = 'https://www.sayurbox.com/graphql/v1?deduplicate=1'
payload = {
'operationName': 'getProducts',
'variables': {
'deliveryConfigId': DCId,
'sortBy': 'related_product',
'isInstantDelivery': False,
'slug': slugcat,
'first': 12,
'abTestFeatures': ['category-page-subcategory-section-v5#####control']
},
'query': 'query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) { productsByCategoryOrSubcategoryAndDeliveryConfig( deliveryConfigId: $deliveryConfigId sortBy: $sortBy slug: $slug after: $after first: $first isInstantDelivery: $isInstantDelivery abTestFeatures: $abTestFeatures ) { edges { node { ...ProductInfoFragment __typename } __typename } pageInfo { hasNextPage endCursor __typename } productBuilder __typename }}fragment ProductInfoFragment on Product { id uuid deliveryConfigId displayName priceRanges priceMin priceMax actualPriceMin actualPriceMax slug label isInstant isInstantOnly nextDayAvailability heroImage promo discount isDiscount variantType imageIds isStockAvailable defaultVariantSkuCode quantitySoldFormatted promotion { quota isShown campaignId __typename } productVariants { productVariant { id skuCode variantName maxQty isDiscount stockAvailable promotion { quota campaignId isShown __typename } __typename } pageInfo { hasPreviousPage hasNextPage __typename } __typename } __typename}'}
response = requests.post(url, headers=headers, json=payload1)
data = response.json()
Output (with Pandas):
import pandas as pd
df = pd.json_normalize([node['node'] for node in data['data']['productsByCategoryOrSubcategoryAndDeliveryConfig']['edges']])
>>> df
id uuid ... productVariants.pageInfo.__typename productVariants.__typename
0 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 479c7805-3b26-4bb9-93b9-5689a2d3bb9d ... PageInfo productVariantsConnection
1 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ba7154a1-e784-451d-88e0-10ede13d55b3 ... PageInfo productVariantsConnection
2 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 5e023650-50fa-4adc-800d-be14cac7f1eb ... PageInfo productVariantsConnection
3 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... eec5c6fa-70b9-45d8-a316-6820d1ed68c3 ... PageInfo productVariantsConnection
4 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... ee1a0910-f021-48e4-a8d0-ab54f4358bde ... PageInfo productVariantsConnection
5 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 17dccf7a-0763-4c34-a537-7b746bdba683 ... PageInfo productVariantsConnection
6 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 90bbee6d-184e-4d8b-8702-77b660883a00 ... PageInfo productVariantsConnection
7 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... f7e51319-0dd3-4c21-9bba-bc8e3f71db94 ... PageInfo productVariantsConnection
8 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 9f889a62-9302-48db-a972-cff035440ee4 ... PageInfo productVariantsConnection
9 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... dd58f053-238f-45f6-b937-687c1e1db3b0 ... PageInfo productVariantsConnection
10 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... 05c37b4e-cf0f-4cf5-a9a8-20ea00029063 ... PageInfo productVariantsConnection
11 UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk... e559850a-2344-4bb4-be70-932214aace91 ... PageInfo productVariantsConnection
[12 rows x 30 columns]
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 12 non-null object
1 uuid 12 non-null object
2 deliveryConfigId 12 non-null object
3 displayName 12 non-null object
4 priceRanges 12 non-null object
5 priceMin 12 non-null int64
6 priceMax 12 non-null int64
7 actualPriceMin 12 non-null int64
8 actualPriceMax 12 non-null int64
9 slug 12 non-null object
10 label 0 non-null object
11 isInstant 12 non-null bool
12 isInstantOnly 12 non-null bool
13 nextDayAvailability 12 non-null bool
14 heroImage 12 non-null object
15 promo 12 non-null object
16 discount 12 non-null object
17 isDiscount 12 non-null bool
18 variantType 12 non-null object
19 imageIds 12 non-null object
20 isStockAvailable 12 non-null bool
21 defaultVariantSkuCode 12 non-null object
22 quantitySoldFormatted 12 non-null object
23 promotion 0 non-null object
24 __typename 12 non-null object
25 productVariants.productVariant 12 non-null object
26 productVariants.pageInfo.hasPreviousPage 12 non-null bool
27 productVariants.pageInfo.hasNextPage 12 non-null bool
28 productVariants.pageInfo.__typename 12 non-null object
29 productVariants.__typename 12 non-null object
dtypes: bool(7), int64(4), object(19)
memory usage: 2.4+ KB