How to use Python via AWS Lambda to send millions (large quantities of data) of events to Splunk

Question:

I am testing a Python script that pulls data from an API and sends the data to Splunk. The script is working fine, but my issue is that I will need to send millions of events daily from the API to Splunk. In my local testing, I am only able to send a few thousand events per hour. I eventually need to port this into Lambda for scheduled automation.

I know about the multiprocessing Python module, but my concern is that even if I get that logic up and running, at best I will be able to send 10’s thousands of events an hour, and Lambda will time out before I’m even close to sending the full range of data. I’m hoping someone has encountered this challenge before and can suggest some options for me to consider. Thank you!

Code:

splunk_conf = {<config stuff>}
for r in range(0,9000000,10000):
     offset = str(r)
     r = requests.get(f'{base_url}/<api>?limit=10000&offset={offset}',   headers = headers).json()
     for x in r['data']:
         splunk_payload = x
         splunk(splunk_payload, splunk_conf)

def splunk(splunk_payload, splunk_conf):
    splunk = SplunkSender(**splunk_conf)
    payloads = [splunk_payload]
    splunk_res = splunk.send_data(payloads)

I wrote my script and got it working, but the sheer volume of data is what will be the limiting factor with my current understanding of the solutions available.

Update:
I was able to get this working by taking the elements in the dictionary and adding them to a list to pass in as the Splunk payload. My original code was sending the events one at a time due to my misunderstanding of how to pass in the data properly.

splunk_token = <code to retreieve token>

for r in range(0,10000000,10000):
  offset = str(r)
  splunk_payload = []
  try:
    r = requests.get(f'{base_url}<API endpoint URL>limit=10000&offset={offset}', headers = headers).json()
    for event in r['data']:
        splunk_payload.append(event)
    splunk(splunk_payload, splunk_token)         
  except Exception as ex:
      print("No more results from API!")
      exit()

def splunk(splunk_payload, splunk_token):
  splunk_conf = { <splunk conf details> }
  splunk = SplunkSender(**splunk_conf)
  splunk_res = splunk.send_data(splunk_payload)
  logging.info(splunk_res)

Asked By: noobuntu

||

Answers:

I was able to get this working by taking the elements in the dictionary and adding them to a list to pass in as the splunk payload. My original code was sending the events one at a time due to my misunderstanding of how to pass in the data properly.

splunk_token = <code to retreieve token>

for r in range(0,10000000,10000):
  offset = str(r)
  splunk_payload = []
  try:
    r = requests.get(f'{base_url}<API endpoint URL>limit=10000&offset={offset}', headers = headers).json()
    for event in r['data']:
        splunk_payload.append(event)
    splunk(splunk_payload, splunk_token)         
  except Exception as ex:
      print("No more results from API!")
      exit()

def splunk(splunk_payload, splunk_token):
  splunk_conf = { <splunk conf details> }
  splunk = SplunkSender(**splunk_conf)
  splunk_res = splunk.send_data(splunk_payload)
  logging.info(splunk_res)
Answered By: noobuntu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.