Why is request returning must provide query string when scraped?

Question:

I am trying to scrape https://www.sayurbox.com/category/vegetables-1-a0d03d59?selectedCategoryType=ops&touch_point=screen_CATEGORY_sembako-1-e6a33b51&section_source=shop_list_slider_navigation_category_vegetables-1-a0d03d59

Here’s my current code:

dcID="RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl"
slugcat="vegetables-1-a0d03d59"
url="https://www.sayurbox.com/graphql/v1?deduplicate=1"

payload={"operationName":"getCartItemCount",
           "variables":{"deliveryConfigId":DCId},
           "query":"query getCartItemCount($deliveryConfigId: ID!) {n  cart(deliveryConfigId: $deliveryConfigId) {n    idn    countn    __typenamen  }n}"},{"operationName":"getProducts",
            "variables":{"deliveryConfigId":DCId,
                         "sortBy":"related_product",
                         "isInstantDelivery":False,
                         "slug":slugcat,
                         "first":12,
                         "abTestFeatures":[]},
            "query":"query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) {n  productsByCategoryOrSubcategoryAndDeliveryConfig(n    deliveryConfigId: $deliveryConfigIdn    sortBy: $sortByn    slug: $slugn    after: $aftern    first: $firstn    isInstantDelivery: $isInstantDeliveryn    abTestFeatures: $abTestFeaturesn  ) {n    edges {n      node {n        ...ProductInfoFragmentn        __typenamen      }n      __typenamen    }n    pageInfo {n      hasNextPagen      endCursorn      __typenamen    }n    productBuildern    __typenamen  }n}nnfragment ProductInfoFragment on Product {n  idn  uuidn  deliveryConfigIdn  displayNamen  priceRangesn  priceMinn  priceMaxn  actualPriceMinn  actualPriceMaxn  slugn  labeln  isInstantn  isInstantOnlyn  nextDayAvailabilityn  heroImagen  promon  discountn  isDiscountn  variantTypen  imageIdsn  isStockAvailablen  defaultVariantSkuCoden  quantitySoldFormattedn  promotion {n    quotan    isShownn    campaignIdn    __typenamen  }n  productVariants {n    productVariant {n      idn      skuCoden      variantNamen      maxQtyn      isDiscountn      stockAvailablen      promotion {n        quotan        campaignIdn        isShownn        __typenamen      }n      __typenamen    }n    pageInfo {n      hasPreviousPagen      hasNextPagen      __typenamen    }n    __typenamen  }n  __typenamen}"}

response=requests.get(url,headers=headers,json=payload)
response.json()

The response returns

[{'errors': [{'message': 'Must provide query string.',
    'extensions': {'timestamp': 1675842901472}}]},
 {'errors': [{'message': 'Must provide query string.',
    'extensions': {'timestamp': 1675842901472}}]}]

I am not sure where I went wrong, as I’ve copied the payload and headers exactly. Can someone help?

Asked By: Hal

||

Answers:

Get requests generally shouldn’t have a payload. I think these are just query parameters you’re trying to supply. Try changing the payload json argument to params. https://www.w3schools.com/python/ref_requests_get.asp

Answered By: Joseph

First, the request should be a POST and not a GET. Second thing, I think you don’t want to operate on "getCartItemCount" but probably on "getProducts".

DCId = 'RGVsaXZlcnlDb25maWc6VGh1cnNkYXksIDA5IEZlYnJ1YXJ5IDIwMjN8SkswMXxTRDI5fGZhbHNl'
slugcat = 'vegetables-1-a0d03d59'
url = 'https://www.sayurbox.com/graphql/v1?deduplicate=1'

payload = {
    'operationName': 'getProducts',
    'variables': {
        'deliveryConfigId': DCId,
        'sortBy': 'related_product',
        'isInstantDelivery': False,
        'slug': slugcat,
        'first': 12,
        'abTestFeatures': ['category-page-subcategory-section-v5#####control']
    },
    'query': 'query getProducts($deliveryConfigId: ID!, $sortBy: CatalogueSortType!, $slug: String!, $after: String, $first: Int, $isInstantDelivery: Boolean, $abTestFeatures: [String!]) { productsByCategoryOrSubcategoryAndDeliveryConfig( deliveryConfigId: $deliveryConfigId sortBy: $sortBy slug: $slug after: $after first: $first isInstantDelivery: $isInstantDelivery abTestFeatures: $abTestFeatures ) { edges { node { ...ProductInfoFragment __typename } __typename } pageInfo { hasNextPage endCursor __typename } productBuilder __typename }}fragment ProductInfoFragment on Product { id uuid deliveryConfigId displayName priceRanges priceMin priceMax actualPriceMin actualPriceMax slug label isInstant isInstantOnly nextDayAvailability heroImage promo discount isDiscount variantType imageIds isStockAvailable defaultVariantSkuCode quantitySoldFormatted promotion { quota isShown campaignId __typename } productVariants { productVariant { id skuCode variantName maxQty isDiscount stockAvailable promotion { quota campaignId isShown __typename } __typename } pageInfo { hasPreviousPage hasNextPage __typename } __typename } __typename}'}

response = requests.post(url, headers=headers, json=payload1)
data = response.json()

Output (with Pandas):

import pandas as pd

df = pd.json_normalize([node['node'] for node in data['data']['productsByCategoryOrSubcategoryAndDeliveryConfig']['edges']])
>>> df
                                                   id                                  uuid  ... productVariants.pageInfo.__typename productVariants.__typename
0   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  479c7805-3b26-4bb9-93b9-5689a2d3bb9d  ...                            PageInfo  productVariantsConnection
1   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  ba7154a1-e784-451d-88e0-10ede13d55b3  ...                            PageInfo  productVariantsConnection
2   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  5e023650-50fa-4adc-800d-be14cac7f1eb  ...                            PageInfo  productVariantsConnection
3   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  eec5c6fa-70b9-45d8-a316-6820d1ed68c3  ...                            PageInfo  productVariantsConnection
4   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  ee1a0910-f021-48e4-a8d0-ab54f4358bde  ...                            PageInfo  productVariantsConnection
5   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  17dccf7a-0763-4c34-a537-7b746bdba683  ...                            PageInfo  productVariantsConnection
6   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  90bbee6d-184e-4d8b-8702-77b660883a00  ...                            PageInfo  productVariantsConnection
7   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  f7e51319-0dd3-4c21-9bba-bc8e3f71db94  ...                            PageInfo  productVariantsConnection
8   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  9f889a62-9302-48db-a972-cff035440ee4  ...                            PageInfo  productVariantsConnection
9   UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  dd58f053-238f-45f6-b937-687c1e1db3b0  ...                            PageInfo  productVariantsConnection
10  UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  05c37b4e-cf0f-4cf5-a9a8-20ea00029063  ...                            PageInfo  productVariantsConnection
11  UHJvZHVjdDpSR1ZzYVhabGNubERiMjVtYVdjNlZHaDFjbk...  e559850a-2344-4bb4-be70-932214aace91  ...                            PageInfo  productVariantsConnection

[12 rows x 30 columns]


>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 30 columns):
 #   Column                                    Non-Null Count  Dtype 
---  ------                                    --------------  ----- 
 0   id                                        12 non-null     object
 1   uuid                                      12 non-null     object
 2   deliveryConfigId                          12 non-null     object
 3   displayName                               12 non-null     object
 4   priceRanges                               12 non-null     object
 5   priceMin                                  12 non-null     int64 
 6   priceMax                                  12 non-null     int64 
 7   actualPriceMin                            12 non-null     int64 
 8   actualPriceMax                            12 non-null     int64 
 9   slug                                      12 non-null     object
 10  label                                     0 non-null      object
 11  isInstant                                 12 non-null     bool  
 12  isInstantOnly                             12 non-null     bool  
 13  nextDayAvailability                       12 non-null     bool  
 14  heroImage                                 12 non-null     object
 15  promo                                     12 non-null     object
 16  discount                                  12 non-null     object
 17  isDiscount                                12 non-null     bool  
 18  variantType                               12 non-null     object
 19  imageIds                                  12 non-null     object
 20  isStockAvailable                          12 non-null     bool  
 21  defaultVariantSkuCode                     12 non-null     object
 22  quantitySoldFormatted                     12 non-null     object
 23  promotion                                 0 non-null      object
 24  __typename                                12 non-null     object
 25  productVariants.productVariant            12 non-null     object
 26  productVariants.pageInfo.hasPreviousPage  12 non-null     bool  
 27  productVariants.pageInfo.hasNextPage      12 non-null     bool  
 28  productVariants.pageInfo.__typename       12 non-null     object
 29  productVariants.__typename                12 non-null     object
dtypes: bool(7), int64(4), object(19)
memory usage: 2.4+ KB
Answered By: Corralien